Precautions for Termination of Pod of Container Native Load balancing of GKE

3 min readSep 27, 2021

Introduction

GKE recommends Container Native load balancing. This allows the GCP load balancer to route directly to the pod’s IP using a mechanism called Alias IP, NEG. However, if the pods are not set properly, downtime will occur when the pods are evicted from the node due to cluster maintenance or the like. In this article, I will explain how Container Native Load Balancer works and how to properly configure Pods.

Container Native load balancing mechanism

As described in Container Native Load Balancing, there is a Custom Controller called NEG Controller in the Master node of GKE, and when a Service with a specific Annotation is registered, it seems that a NEG resource is created in GCP and the Pod associated with the Service is attached to NEG. .. Also, as the name of the zonal network endpoint group suggests, a NEG is created for each zone, and the pod belongs to the NEG of the zone in which it exists.

Precautions to wear when evacuating a pod

GKE will automatically upgrade the cluster so that the nodes will be rolling updated. Then, the pod scheduled at that timing will be spit out once and recreated on another node, so if you do not pay attention to the life cycle, downtime will occur.

Life cycle when a pod is evacuated

When the pod is evacuated and once it is out of the NEG and cannot be routed, the flow will be as follows.

Pod goes to Terminating state
The following run at the same time
Pod saved from Service Endpoint is removed
Pod preStop + SIGTERM processing runs
NEG Controller removes the pod from the NEG by detecting that the pod has been removed from the service.
GCLB is no longer routed to the evacuated pod

In other words, there are two points to be aware of here.

Prevent Pod from stopping before leaving NEG
Even if it deviates from NEG, only the request being processed is processed.

Specific correspondence

Regarding first, let’s implement preStop properly in the pod.
It’s the same as when routing with basic Service.

lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "sleep 20"]

Regarding second, let’s set the connection drain in the Backend setting of GCLB. Also, be aware that if 1 is not set longer than the drain time of 2, the pod will die before the worst drain ends.

Enabling connection draining | Load Balancing | Google Cloud

Connection draining is a process that ensures that existing, in-progress requests are given time to complete when a VM…

cloud.google.com

If you are using Ingress, you can also set it in CRD.

Configuring Ingress features | Kubernetes Engine Documentation

This page provides a comprehensive overview of what's supported and configurable through Kubernetes Ingress on Google…