aws-load-balancer-controller icon indicating copy to clipboard operation
aws-load-balancer-controller copied to clipboard

ZeroMQ support in NLB

Open bozhang-hpc opened this issue 6 months ago • 5 comments

Hi,

I'm trying to send messages between a client outside k8s cluster and a broker inside the cluster using PyZMQ.

  1. I tried using conventional load balancer and gives me a handshake error.
DEBUG:root:Client Socket Event: {'event': <Event.ACCEPTED: 32>, 'value': <Event.CONNECT_RETRIED|BIND_FAILED: 20>, 'endpoint': b'tcp://192.168.0.68:5559', 'description': 'EVENT_ACCEPTED'}
DEBUG:root:Client Socket Event: {'event': <Event.HANDSHAKE_FAILED_NO_DETAIL: 2048>, 'value': <Event.ACCEPTED: 32>, 'endpoint': b'tcp://192.168.0.68:5559', 'description': 'EVENT_HANDSHAKE_FAILED_NO_DETAIL'}
DEBUG:root:Client Socket Event: {'event': <Event.DISCONNECTED: 512>, 'value': <Event.CONNECT_RETRIED|BIND_FAILED: 20>, 'endpoint': b'tcp://192.168.0.68:5559', 'description': 'EVENT_DISCONNECTED'}
  1. Then I tried the nlb with the manifest below but still got same handshake error
apiVersion: v1
kind: Service
metadata:
  name: rexec-broker-external-ip
  labels:
    app: rexec-broker
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: external
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  selector:
    app: rexec-broker
  ports:
    - name: "client-port"
      port: 5559
      targetPort: 5559
    - name: "control-port-external"
      port: 5561
      targetPort: 5561
  type: LoadBalancer
  loadBalancerClass: service.k8s.aws/nlb
  1. The final workaround is to use the NodePort to directly route my traffics.

So I'm wondering if the AWS nlb supports the ZMQ protocol or I was misconfiguring anything?

bozhang-hpc avatar May 19 '25 18:05 bozhang-hpc

Thanks for reaching out, sorry for the trouble!

Based on the ZeroMQ docs, ZMQ is a messaging library that can run on TCP.

NLB supports TCP. As long as ZMQ supports TCP, and as long as your application is using TCP, I'd expect it to work.

As basic troubleshooting steps:

  1. Could you try running your application without a load balancer in between and see if it works? This will help narrow down if it's an issue with the client, broker, or application.
  2. Could you double-check your NLB security groups, ports, etc. are configured correctly? This will help narrow down if it's an issue with the NLB configuration.

This is likely not an issue with the AWS Load Balancer Controller in particular - you can also consider posting on AWS re:Post, contacting AWS Support, or using ZMQ support options.

andreybutenko avatar May 21 '25 21:05 andreybutenko

Right, I'm using TCP in ZMQ and since NLB support TCP, this is the intention I use it to expose an ip to the client outside the k8s.

My use case is like this: I have several clients (outside the k8s cluster), taking to a broker (inside the k8s cluster), and the broker forwards the traffic to several servers (inside the k8s cluster), calling some functions inside the server.

Basically, the communication is like this: client <--------> (external-ip-service) broker (internal-ip-service) <--------> server. I have two services deployed on the broker: the external-ip-service is used to expose an ip to the outside client (which uses nlb, and the internal-ip-service exposes an ClusterIP to the internal server.

The internal ClusterIP traffic doesn't have any issue. And when I use NodePort + EC2 public ip for the external traffic, everything is fine. But when I switch to nlb, the broker gives the error messages posted above. That's why I start to question if nlb supports ZMQ.

@yutianqin Could you post the nlb configuration here?

bozhang-hpc avatar May 22 '25 02:05 bozhang-hpc

Sorry for the delay. Before applying load balancer service, we followed https://docs.aws.amazon.com/eks/latest/userguide/lbc-helm.html to install AWS Load Balancer Controller.

I made sure AWSLoadBalancerControllerIAMPolicy is created and it's attached to the service account 'aws-load-balancer-controller' in our EKS cluster(Kubernetes version 1.32). Then I install helm chart(version 1.13.0 - aws load balancer controller v2.13.0):

helm install aws-load-balancer-controller eks/aws-load-balancer-controller \
  -n kube-system \
  --set clusterName=<our-cluster-name> \
  --set serviceAccount.create=false \
  --set serviceAccount.name=aws-load-balancer-controller \
  --set region=us-west-2 \
  --set vpcId=<our-vpc-id> \
  --version 1.13.0

And this is the manifest we are using, which I think we had shared earlier in this thread:

apiVersion: v1
kind: Service
metadata:
  name: rexec-broker-external-ip
  labels:
    app: rexec-broker
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: external
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
spec:
  selector:
    app: rexec-broker
  ports:
    - name: "client-port"
      port: 5559
      targetPort: 5559
    - name: "control-port-external"
      port: 5561
      targetPort: 5561
  type: LoadBalancer
  loadBalancerClass: service.k8s.aws/nlb

yutianqin avatar May 22 '25 20:05 yutianqin

Thanks for posting the yaml, it was helpful to debug. Our documentation could be a bit clearer here. To utilize instance targets, service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance, the service needs to expose a NodePort. One suggestion I have is to switch to using IP targets service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip and everything should work just fine. Please let me know what you think :). I'll get some doc updates together to clarify the requirements of using instance targets.

zac-nixon avatar May 29 '25 16:05 zac-nixon

I tried service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip before, but it didn't work.

My socket monitor didn't print anything out, but just hung there. So I assume it even receive or recognize the traffic.

BTW, to use service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: instance, how can I expose a NodePort for the service?

bozhang-hpc avatar May 29 '25 22:05 bozhang-hpc

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot avatar Aug 27 '25 22:08 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot avatar Sep 26 '25 23:09 k8s-triage-robot

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-triage-robot avatar Oct 26 '25 23:10 k8s-triage-robot

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot avatar Oct 26 '25 23:10 k8s-ci-robot