agones icon indicating copy to clipboard operation
agones copied to clipboard

agones-ping-udp-service brings up incompatible load balancer in EKS

Open structurefall opened this issue 4 years ago • 22 comments

What happened:

When installing Agones in an EKS cluster, the agones-ping-udp-service fails to create with the following error:

Warning CreatingLoadBalancerFailed 1s (x4 over 37s) service-controller Error creating load balancer (will retry): failed to ensure load balancer for service agones-system/agones-ping-udp-service: Only TCP LoadBalancer is supported for AWS ELB

What you expected to happen: The service should have created successfully.

How to reproduce it (as minimally and precisely as possible): Build an EKS cluster, follow installation instructions for Agones.

Anything else we need to know?: Yes, we already know the fix! The service needs the following annotation: annotations: service.beta.kubernetes.io/aws-load-balancer-type: nlb

Environment:

  • Agones version: 1.0.0
  • Kubernetes version (use kubectl version): 1.12
  • Cloud provider or hardware configuration: AWS/EKS
  • Install method (yaml/helm): YAML, but I checked the Helm chart and it would have the same issue
  • Troubleshooting guide log(s): n/a
  • Others:

structurefall avatar Oct 25 '19 00:10 structurefall

I figure we could set this annotation by default. I don't think it's going to affect other cloud platform's load balancers?

markmandel avatar Oct 25 '19 00:10 markmandel

I can check if it would work. /assign alekser

aLekSer avatar Oct 30 '19 17:10 aLekSer

I tried the fix which you suggested and it has not worked for the Kubernetes version: 1.12. https://github.com/kubernetes/kubernetes/issues/79523 I think we should wait till 1.17 for bug triage as per comments.

aLekSer avatar Nov 01 '19 11:11 aLekSer

kubectl describe agones-ping-udp-service output:

Name:         agones-ping-udp-service
Namespace:    agones-system
Labels:       app=agones
              chart=agones-1.1.0
              component=ping
              heritage=Tiller
              release=agones-manual
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"service.beta.kubernetes.io/aws-load-balancer-type":"nlb"},"labels":{"app":...
              service.beta.kubernetes.io/aws-load-balancer-type: nlb
API Version:  v1
Kind:         Service
Metadata:
  Creation Timestamp:  2019-11-01T11:49:47Z
  Resource Version:    4196
  Self Link:           /api/v1/namespaces/agones-system/services/agones-ping-udp-service
  UID:                 b44a207f-fc9d-11e9-b526-0ad256200a94
Spec:
  Cluster IP:               10.100.25.80
  External Traffic Policy:  Cluster
  Ports:
    Name:         udp
    Node Port:    30518
    Port:         50000
    Protocol:     UDP
    Target Port:  8080
  Selector:
    agones.dev/role:  ping
  Session Affinity:   None
  Type:               LoadBalancer
Status:
  Load Balancer:
Events:
  Type     Reason                      Age                From                Message
  ----     ------                      ----               ----                -------
  Normal   EnsuringLoadBalancer        51s (x2 over 82s)  service-controller  Ensuring load balancer
  Warning  CreatingLoadBalancerFailed  51s (x2 over 82s)  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service agones-system/agones-ping-udp-service: Only TCP LoadBalancer is supported for AWS ELB

aLekSer avatar Nov 01 '19 12:11 aLekSer

@aLekSer based on your output, it looks like the service is still trying to build without the annotation. You need to completely destroy the service and rebuild it for that to work. When I tested this it was in fact on 1.12.

You're getting the same error message that indicates that the load balancer was built sans annotation: "Only TCP LoadBalancer is supported for AWS ELB." There's a brief AWS doc on the annotation here.

structurefall avatar Nov 04 '19 19:11 structurefall

I will soon recheck this if it still actual.

aLekSer avatar Jan 23 '20 15:01 aLekSer

Tested this on the most recent Terraform EKS config with udp_expose = "true" in examples/terraform-submodules/eks/module.tf:

  Type     Reason                      Age                    From                Message
  ----     ------                      ----                   ----                -------
  Normal   EnsuringLoadBalancer        4m31s (x7 over 9m46s)  service-controller  Ensuring load balancer                                                                                                 
  Warning  CreatingLoadBalancerFailed  4m31s (x7 over 9m46s)  service-controller  Error creating load balancer (will retry): failed to ensure load balancer for service agones-system/agones-ping-udp-service: Only TCP LoadBalancer is supported for AWS ELB
BY-IT00060:eks alexander.apalikov$ kubectl get services --namespace agones-system
NAME                        TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)            AGE                                             
agones-allocator            LoadBalancer   172.20.90.48     abf6453fe41e311eaa72b02706d23ee8-806979722.us-west-2.elb.amazonaws.com    443:31237/TCP      39m                                             
agones-controller-service   ClusterIP      172.20.120.21    <none>                                                                    443/TCP,8080/TCP   39m                                             
agones-ping-http-service    LoadBalancer   172.20.172.167   abf5f999b41e311eaa72b02706d23ee8-1635672743.us-west-2.elb.amazonaws.com   80:30456/TCP       39m                                             
agones-ping-udp-service     LoadBalancer   172.20.196.38    <pending>                                                                 50000:30098/UDP    10m      

Will try to add annotations to helm.tf. We should wait for this PR to be merged: https://github.com/kubernetes/kubernetes/pull/87549/files It seems that Kubernetes code itself does not allow this to happen.

aLekSer avatar Jan 28 '20 16:01 aLekSer

It seems that the fix could happen in 1.19 version only as per this comment: https://github.com/kubernetes/kubernetes/pull/87549#issuecomment-595440964

aLekSer avatar Mar 12 '20 19:03 aLekSer

1.19 is a long way away (especially on EKS). Is there anything we can do in the mean time? Is there a documentation change we can make to explain to EKS users how to fix this manually?

roberthbailey avatar Mar 26 '20 16:03 roberthbailey

Yes, I am also looking into a documentation update.

aLekSer avatar Apr 01 '20 13:04 aLekSer

The upstream fix for this has landed in k8s master, so will be in 1.19, which should be available in EKS sometime in 2021.

TBBle avatar Jun 25 '20 05:06 TBBle

I didn't notice until recently, that AWS EKS has backported the NLB UDP support from 1.19 as far back as 1.15, per Platform versions:

  • 1.15.11.eks.4 or later
  • 1.16.13.eks.3 or later
  • 1.17.9.eks.2 or later

1.18 doesn't mention it, but the above backports (August 12th) predate 1.18 on EKS (October 13th), so it should have been included there too. My team expects to deploy Agones on an EKS 1.18 platform sometime next month, so I'll try and remember to check that; we disable the UDP-ping service by default, so I'll ask the person doing the deployment to try leaving it on with an NLB (or ideally NLB-IP) annotation.

EKS support for 1.14 ends on December 8th, so I believe this issue is in a good state to close, with the caveat that 1.14 doesn't support it, and never will. However, I have not tested the fix myself yet.

TBBle avatar Nov 18 '20 09:11 TBBle

Is this solved? I think it might be, but I'm not sure.

Can someone confirm, so we can close this issue?

markmandel avatar Apr 22 '21 16:04 markmandel

I think you still need to set the annotation for the agones-ping Service service.beta.kubernetes.io/aws-load-balancer-type: nlb in the installation sources if you want it to work out of the box. We use kustomize and have an overlay for it.

tenevdev avatar May 31 '21 10:05 tenevdev

I think that would be a user-issue though? I wouldn't expect the Agones source to contain annotations for any/every kind of platform-specific environment.

And deploying things to AWS, I had to use such annotations everywhere (since I wanted NLB everywhere) so this wasn't a shock or a challenge for me.

The error could be nicer: "Use NLB mode" would be better than "ELB can't do this", but that's a k8s/AWS problem, not an Agones problem.

Particularly because for AWS, we now have both nlb and nlb-ip types, and the latter is better but requires a (not in-the-box) AWS Load Balancer Controller (due to the migration of cloud-specific controllers out of kubernetes core), so including either annotation is going to be wrong for a significant group of AWS users, and nonsensical for non-AWS users.

I would have to check this, but I think for load balancers, AWS are trying to introduce different annotations, which adds 'external' to the list of possible types, in order to explicitly hand-off to the afore-mentioned AWS Load Balancer Controller.

If there's AWS-specific instructions around, it's worth mentioning the options there, I guess, with appropriate links

TBBle avatar May 31 '21 13:05 TBBle

If there are docs to be added to: https://agones.dev/site/docs/installation/creating-cluster/eks/ - that is always appreciated!

markmandel avatar Jun 01 '21 20:06 markmandel

@markmandel Yes, the following annotations needs to be added if we have to replace Classic Load balancers with Network Load Balancers. NLB supports UDP s

I am currently testing the annotations on Agones Helm chart and the following works on EKS 1.19 and Agones 1.15. It creates two NLBs, one for HTTP and other one for UDP ping.

http:
  expose: true
  response: ok
  port: 80
  serviceType: LoadBalancer
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "false"
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: ${s3_bucket}
udp:
  expose: true
  rateLimit: 20
  port: 50000
  serviceType: LoadBalancer
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-internal: "false"
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-access-log-enabled: "true"
    service.beta.kubernetes.io/aws-load-balancer-access-log-s3-bucket-name: ${s3_bucket}

vara-bonthu avatar Jun 20 '21 22:06 vara-bonthu

@markmandel Yes, the following annotations needs to be added if we have to replace Classic Load balancers with Network Load Balancers. NLB supports UDP s

Can someone who clearly knows EKS way better than I do file a PR for the docs? That would be aces.

Sounds like you've got it worked out @vara-bonthu ?

markmandel avatar Jun 21 '21 21:06 markmandel

service.beta.kubernetes.io/aws-load-balancer-type does bring up a valid NLB with a target group, but the nodes in the target group fail healthchecks as NLBs do not support UDP healthchecks and would fail TCP healthchecks on the traffic port. LoadBalancer services w/ mixed UDP and TCP ports are not currently supported, but I was able to get this working by doing some manual setup of the Agones ping service and setting the following annotations on the UDP ping service:

      "service.beta.kubernetes.io/aws-load-balancer-type"                 = "nlb"
      "service.beta.kubernetes.io/aws-load-balancer-healthcheck-protocol" = "HTTP"
      "service.beta.kubernetes.io/aws-load-balancer-healthcheck-path"     = "/"
      "service.beta.kubernetes.io/aws-load-balancer-healthcheck-port"     = {NodePortForHTTPPingService}
      "service.beta.kubernetes.io/aws-load-balancer-target-node-labels"   = "agones.dev/agones-system=true"

This will be cleaner in the future when MixedProtocolLBService is suported in EKS.

ajjohnston avatar May 02 '22 14:05 ajjohnston

Hello I installed agones on EKS last week, and agones-ping-udp-service still says "pending" I followed the instructions on the agones EKS site to the details using kubeclt apply .yaml since I dont like using helm can someone update the documentation or post the steps here please?

agones-ping-udp-service            LoadBalancer   10.xx.xx.xx     <pending> 

ChrisFToptal avatar Aug 29 '22 15:08 ChrisFToptal

My understanding is that right now you need to use helm to set the additional annotations to configure the NLB to use UDP. You don't have to use helm to do the install; you can use it to render a customized yaml file that contains the additional annotations (this is described on the Install Agones using YAML page). Alternatively, you could edit your yaml file by hand to add the annotations shown above.

roberthbailey avatar Aug 30 '22 05:08 roberthbailey

@roberthbailey Thank you for the reply and the pointers, we never use helm, rather if the right config goes into the official documentation for kustomize/yaml install [update] I guess for what I understand I can create the .yaml/kustomize file with helm output, so I downloaded it locally and did

helm pull --untar https://agones.dev/chart/stable/agones-1.25.0.tgz

So, now I can run I guess the other part, what options should I add to this next helm command to force the right NLB to use UDP?

ChrisFToptal avatar Aug 30 '22 08:08 ChrisFToptal

@markmandel I'm seeing this issue in OpenShift-on-AWS too (probably similar underlying parts as EKS) and can open a PR. However, I just found https://github.com/googleforgames/agones/commit/3c38876382d505244af68489211d8916dce07596 which might be better for me to use than just changing the Helm charts and YAML install files. What do you think?

I consider using pkg/cloudproduct since the NLB Health Checks must be aware of the Service nodePort and this isn't available at install time using Helm/YAML.

bostrt avatar Nov 17 '22 19:11 bostrt

I consider using pkg/cloudproduct since the NLB Health Checks must be aware of the Service nodePort and this isn't available at install time using Helm/YAML.

After typing this, I realize it might be overkill if MixedProtocolLBService is going to be supported in EKS (and other k8s integrations for AWS). Maybe just updating the YAML is best so the UDP ping service at least starts.

bostrt avatar Nov 18 '22 14:11 bostrt

Yeah, also that's totally a separate thing that affects more Pod creation for GameServers (although, very interesting! 😄 )

Question though: Is this a documentation issue?

If we added some kind of AWS installation docs that said what agones.ping.udp.annotations and agones.ping.http.annotations should be on installation, would that solve the issue?

markmandel avatar Nov 18 '22 20:11 markmandel

'This issue is marked as Stale due to inactivity for more than 30 days. To avoid being marked as 'stale' please add 'awaiting-maintainer' label or add a comment. Thank you for your contributions '

github-actions[bot] avatar May 15 '23 10:05 github-actions[bot]

This issue is marked as obsolete due to inactivity for last 60 days. To avoid issue getting closed in next 30 days, please add a comment or add 'awaiting-maintainer' label. Thank you for your contributions

github-actions[bot] avatar Jun 15 '23 02:06 github-actions[bot]

@author, We are closing this as there was no activity in this issue for last 90 days. Please reopen if you’d like to discuss anything further.

github-actions[bot] avatar Aug 01 '23 01:08 github-actions[bot]