nginx-gateway-fabric
nginx-gateway-fabric copied to clipboard
Health status out-of-service or Unhealthy in ALB/NLB on AWS
Describe the bug I activated the Nginx Gateway Fabric on EKS, but only the instance that contains the nginx-gateway pod is healthy on the load balancer. It is possible to run pods inside others instances even unhealthy.
To Reproduce Steps to reproduce the behavior:
kubectl kustomize "https://github.com/nginxinc/nginx-gateway-fabric/config/crd/gateway-api/standard?ref=v1.4.0" | kubectl apply -f -
kubectl apply -f https://raw.githubusercontent.com/nginxinc/nginx-gateway-fabric/v1.4.0/deploy/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/nginxinc/nginx-gateway-fabric/v1.4.0/deploy/default/deploy.yaml
Expected behavior Show all instances healthy.
Your environment
- Version of the NGINX Gateway Fabric - v.1.4.0
- Version of Kubernetes - 1.30
- Kubernetes platform - EKS
Additional context No more info. Just this simple commands.
A screenshot
@samuelrcarvalho, I don't understand the issue. Is the unhealthy pod shown in the screenshot NGINX Gateway Fabric?
@kate-osborn I ran those commands on my EKS cluster (at that time, the cluster had just 2 nodes, but I've now scaled it to 10, as shown in the image below). The NGINX Fabric was installed. As you can see in the AWS Load Balancer console, only one instance is healthy for the Load Balancer. All nodes are fine in EKS, but not for the Load Balancer. The traffic seems to be routed correctly within the cluster, but the healthy status remains unhealthy. One thing I noticed is that the only healthy instance is the one with the nginx-gateway pod running. I performed a test: if I terminate that healthy instance (that runs the nginx-gateway), the nginx-gateway pod is created on another node, and after a few seconds, that node (the instance for the Load Balancer) becomes healthy.
I'm not an expert on EKS, but I believe this is expected behavior and has to do with the externalTrafficPolicy: local Service setting: https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip.
When externalTrafficPolicy is local, the LoadBalancer only routes requests to Nodes with the target Service's Pod running on it (nginx-gateway in this case). My guess is that AWS implements this by marking nodes without the target Service's Pods as unhealthy and taking them out of the rotation.
Can you pass traffic to nginx-gateway through the LoadBalancer?
You can also try a Network Load Balancer:
- https://aws.github.io/aws-eks-best-practices/networking/loadbalancing/loadbalancing/
- https://docs.nginx.com/nginx-gateway-fabric/installation/installing-ngf/manifests/#5-access-nginx-gateway-fabric
- https://raw.githubusercontent.com/nginxinc/nginx-gateway-fabric/v1.4.0/deploy/aws-nlb/deploy.yaml
Tagging @lucacome in case he has something to add.
This issue is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 14 days.
This issue was closed because it has been stalled for 14 days with no activity.