NGINX-Demos icon indicating copy to clipboard operation
NGINX-Demos copied to clipboard

Facing Issues with Load Balancing using NGINX Load Balancer on AWS EKS

Open MeghaVarshney21 opened this issue 3 years ago • 2 comments

I am deploying a triton inference server on the Amazon Elastic Kubernetes Service (Amazon EKS) and using Nginx Open-Source Load Balancer for load-balancing. Our EKS Cluster is private (EKS Nodes are in private subnets) so that no one can access it from the outside world.

Since, triton inference server has three endpoints:- port 8000: for HTTP requests port 8001: for grpc requests port 8002: Prometheus metrics server

First of all, I have created a deployment for Triton on AWS EKS and exposed it using clusterIP = None, so that all the replicas endpoints are exposed and identified by NGINX Load Balancer.

apiVersion: v1
kind: Service
metadata:
  name: triton
  labels:
    app: triton
spec:
  clusterIP: None
  ports:
     - protocol: TCP
       port: 8000
       name: http
       targetPort: 8000
     - protocol: TCP
       port: 8001
       name: grpc
       targetPort: 8001
     - protocol: TCP
       port: 8002
       name: metrics
       targetPort: 8002
  selector:
    app: triton

Then, I have created a image for nginx opensource load balancer using the below configuration. Configuration file for NGINX on EKS node at the location /etc/nginx/conf.d/nginx.conf.

resolver kube-dns.kube-system.svc.cluster.local valid=5s;
upstream backend {
   zone upstream-backend 64k;
   server triton.default.svc.cluster.local:8000;
}
 
upstream backendgrpc {
   zone upstream-backend 64k;
   server triton.default.svc.cluster.local:8001;
}
 
server {
   listen 80;
   location / {
     proxy_pass http://backend/;
   }
}
 
server {
        listen 89 http2;
 
        location / {
            grpc_pass grpc://backendgrpc;
        }
}
 
server {
    listen 8080;
    root /usr/share/nginx/html;
    location = /dashboard.html { }
    location = / {
       return 302 /dashboard.html;
    }
} 

Dockerfile for Nginx Opensource LB is:-

FROM nginx
RUN rm /etc/nginx/conf.d/default.conf
COPY /etc/nginx/conf.d/nginx.conf /etc/nginx/conf.d/default.conf

I have created a ReplicationController for NGINX. To pull the image from the private registry, Kubernetes needs credentials. The imagePullSecrets field in the configuration file specifies that Kubernetes should get the credentials from a Secret named ecr-cred.

The nginx-rc file looks like:-

 apiVersion: v1
 kind: ReplicationController
 metadata:
   name: nginx-rc
 spec:
   replicas: 1
   selector:
     app: nginx
   template:
     metadata:
       labels:
         app: nginx
     spec:
       imagePullSecrets:
       - name: ecr-cred
       containers:
       - name: nginx
         command: [ "/bin/bash", "-c", "--" ]
         args: [ "nginx; while true; do sleep 30; done;" ]
         imagePullPolicy: IfNotPresent
         image: <Image URL with tag>
         ports:
           - name: http
             containerPort: 80
             hostPort: 8085
           - name: grpc
             containerPort: 89
             hostPort: 8087
           - name: http-alt
             containerPort: 8080
             hostPort: 8086
           - name: triton-svc
             containerPort: 8000
             hostPort: 32309

Now, the issue which I am facing is, when the pods are increasing, the nginx load balancer is not doing the load balancing between those newly added pods.

Can anyone help me?

MeghaVarshney21 avatar Apr 13 '22 08:04 MeghaVarshney21

Hi @MeghaVarshney21

resolver kube-dns.kube-system.svc.cluster.local valid=5s;
upstream backend {
   zone upstream-backend 64k;
   server triton.default.svc.cluster.local:8000;
}

NGINX OSS only resolves DNS names when it starts or when it is reloaded. That's why you see "Now, the issue which I am facing is, when the pods are increasing, the nginx load balancer is not doing the load balancing between those newly added pods."

However, re-resolving DNS names is available in NGINX Plus - the commercial version of NGINX. For re-resolving, the configuration looks like this -- https://github.com/nginxinc/NGINX-Demos/blob/master/kubernetes-demo/third/nginxplus/backend.conf#L3-L6

More about re-resolving DNS names -- https://www.nginx.com/blog/dns-service-discovery-nginx-plus/

Also note that for Kubernetes, we also have the Ingress Controller, which works both for NGINX and NGINX OSS and it will automatically update NGINX configuration when new backend pods are added. See https://github.com/nginxinc/kubernetes-ingress

GRPC examples:

  • https://github.com/nginxinc/kubernetes-ingress/tree/main/examples/custom-resources/grpc-upstreams
  • https://github.com/nginxinc/kubernetes-ingress/tree/main/examples/grpc-services

Hope this helps

pleshakov avatar Apr 15 '22 01:04 pleshakov

Thanks @pleshakov

Now, I am using nginx plus loadbalancer but again I am facing an issue.

When hpa scale down the pods, nginx loadbalancer is showing "server not ready" error.

Could you please help me with this issue?

MeghaVarshney21 avatar Apr 20 '22 05:04 MeghaVarshney21