ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

GRPC GOAWAY

Open rcng6514 opened this issue 5 months ago • 14 comments

What happened:

We have an application with GRPC streams working on GKE using an Ingress Cluster. We have a use case where we want to open a long lived grpc stream between my GRPC server(GKE) and Client should send data every second for infinite period of time. To achieve this use case I have designed my code in a way, that I never call OnCompleted method from GRPC Server java implementation, so that my stream remains open for infinite period of time. When I call my grpc method from Client, data transfer starts successfully for some time and streams run fine. However after few minutes(infrequent intervals) connection is automatically terminated giving below error message:- UNAVAILABLE: Connection closed after GOAWAY. HTTP/2 error code: NO_ERROR Time period for this error is not fixed, however this occurs after around 5 minutes of success data transfer between the client and the server(GKE). We have tried various properties and timeouts to increase the longevity for streams (attached below is annotations attempted) however, we haven't found anything concrete on it.

Below is the ingress configuration annotations we are using

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    nginx.ingress.kubernetes.io/backend-protocol: GRPC
    nginx.ingress.kubernetes.io/client-body-timeout: "true"
    nginx.ingress.kubernetes.io/grpc-backend: "true"
    nginx.ingress.kubernetes.io/limit-connections: "1000"
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/server-snippet: keepalive_requests 10000; http2_max_requests
      500000; keepalive_timeout 3600s; grpc_read_timeout 3600s; grpc_send_timeout
      3600s; client_body_timeout 3600s;
    nginx.ingress.kubernetes.io/service-upstream: "true"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/upstream-vhost: example.example-app.svc.cluster.local
    example.io/contact: [email protected]
  creationTimestamp: "2024-02-01T18:13:19Z"
  generation: 3
  labels:
    application: example
  name: example
  namespace: example-app
  resourceVersion: "862855064"
  uid: fcef4054-e9db-4276-9ff0-e6e8b2d84301
spec:
  rules:
  - host: example.com
    http:
      paths:
      - backend:
          service:
            name: example-grpc
            port:
              number: 8980
        path: /
        pathType: Prefix
  tls:
  - hosts:
    - example.com
  loadBalancer:
    ingress:
    - ip: 1.2.3.4

What you expected to happen:

Either annotations are respected or there is a misunderstanding in how we can make the above requirement possible

Not sure, we've exhausted all avenues

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

ingress-nginx-controller-7f9bf47c9f-kl4sp:/etc/nginx$ nginx -version
nginx version: nginx/1.21.6

Kubernetes version (use kubectl version): v1.27.9-gke.1092000

Environment:

  • Cloud provider or hardware configuration: GKE

  • OS (e.g. from /etc/os-release): cos_containerd

  • Kernel (e.g. uname -a):

  • How was the ingress-nginx-controller installed: $ helm template --values values.yaml --namespace example --version 4.9.0 ingress-nginx ingress-nginx/ingress-nginx > manifests.yaml

Then reference manifests.yaml as a resource in Kustomization with env specific patches for naming/annotations + lables

  • Current State of the controller:
    • kubectl describe ingressclasses
Name:         nginx
Labels:       app.kubernetes.io/component=controller
              app.kubernetes.io/instance=ingress-nginx
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=ingress-nginx
              app.kubernetes.io/part-of=ingress-nginx
              app.kubernetes.io/version=1.9.5
              helm.sh/chart=ingress-nginx-4.9.0
              kustomize.toolkit.fluxcd.io/name=gke-cluster-services
              kustomize.toolkit.fluxcd.io/namespace=example
Annotations:  ingressclass.kubernetes.io/is-default-class: true
              example.io/contact: [email protected]
Controller:   k8s.io/ingress-nginx
Events:       <none>
  • Current state of ingress object, if applicable:
  • Others:
    • Any other related information like ;
      • copy/paste of the snippet (if applicable)
      • kubectl describe ... of any custom configmap(s) created and in use
apiVersion: v1
data:
  allow-snippet-annotations: "true"
  annotation-value-word-blocklist: load_module,lua_package,_by_lua,location,root,proxy_pass,serviceaccount,{,},',"
  keep-alive-requests: "1000"
  strict-validate-path-type: "true"
kind: ConfigMap
metadata:
  annotations:
    example.io/contact: '[email protected]'
  creationTimestamp: "2024-02-29T16:01:58Z"
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.9.5
    helm.sh/chart: ingress-nginx-4.9.0
    kustomize.toolkit.fluxcd.io/name: gke-cluster-services
    kustomize.toolkit.fluxcd.io/namespace: example
  name: ingress-nginx-controller
  namespace: example
  resourceVersion: "778846202"
  uid: 622f5cf8-dd98-4d0d-bd9c-79fc9bbd0e4c
- Any other related information that may help

How to reproduce this issue:

Vanilla ingress-nginx install in k8s. Simple GRPC app with long lived connection

This happens across multiple applications on the cluster and across multiple envs so its a specific instance of this. We've read and tried the following: https://kubernetes.github.io/ingress-nginx/examples/grpc/#notes-on-using-responserequest-streams However GOAWAYs continue to occur. Removing ingress-nginx and routing via NodePort, application held open the connection for 24+ hrs

--->

Anything else we need to know:

rcng6514 avatar Mar 13 '24 21:03 rcng6514

This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Mar 13 '24 21:03 k8s-ci-robot

/remove-kind bug

  • Where did you get annotations like client-body-timeout & grpc-backend ? This controller does NOT have anything like that https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/
  • Remove all the annotations except the backend-protocol and the snippet
  • Look at the template of a new bug report and edit your issue to provide all that information
  • Write real practical step by step instruction, including a example app image url, that readers can copy/pase from and reproduce on a minikube or kind cluster

/triage needs-information /kind support

longwuyuan avatar Mar 14 '24 03:03 longwuyuan

What type of GCP load balancer are you using? Can you try port forwarding to the ingress service and see if the issue persists?

strongjz avatar Mar 14 '24 15:03 strongjz

What type of GCP load balancer are you using? Can you try port forwarding to the ingress service and see if the issue persists?

Client -> TCP L4 -> NGINX Ingress Controller -> App

With:

Client -> TCP L4 -> App

Connection holds open for 24 hrs

rcng6514 avatar Mar 14 '24 15:03 rcng6514

@rcng6514 the comment from @strongjz is seeking info on

client --> [port-forward-to-svc-created-by-controller] --> app

longwuyuan avatar Mar 14 '24 16:03 longwuyuan

Morning, tested with TCP L4 LB removed and port forwarded to ingress-controller k8s svc. We still observe the same behaviour and connections are issued a GOAWAY after 5-10 minutes.

  • Write real practical step by step instruction, including a example app image url, that readers can copy/paste from and reproduce on a minikube or kind cluster

@longwuyuan this isn't straightforward to achieve as the app contains IP that we'd need to strip from the image which will take considerable time. We'll start the process on this but will take time so was hoping to at least start the conversation on this.

rcng6514 avatar Mar 15 '24 10:03 rcng6514

@rcng6514 , thanks, Would you know if there is a chart on artifacthub.io or a image in hub.docker.com that can be used for reproduce ?

longwuyuan avatar Mar 15 '24 11:03 longwuyuan

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

github-actions[bot] avatar Apr 15 '24 02:04 github-actions[bot]

  • Install ingress-nginx v1.9.5

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.9.5/deploy/static/provider/cloud/deploy.yaml

  • Apply below manifest:
---
apiVersion: v1
kind: Namespace
metadata:
  name: grpc
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: grpc
  namespace: grpc
  labels:
    app: grpc
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grpc
  template:
    metadata:
      labels:
        app: grpc
    spec:
      containers:
      - name: grpc
        image: docker.io/rcng1514/server
        ports:
        - containerPort: 8443
---
apiVersion: v1
kind: Service
metadata:
  name: grpc
  namespace: grpc
spec:
  selector:
    app: grpc
  ports:
    - protocol: TCP
      port: 8443
      targetPort: 8443
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grpc
  namespace: grpc
  annotations:
    # use the shared ingress-nginx
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
spec:
  ingressClassName: nginx
  rules:
  - host: example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: grpc
            port:
              number: 8443
  tls:
  - hosts:
    - example.com
    secretName: grpc
---
apiVersion: v1
kind: Secret
metadata:
  name: grpc
  namespace: grpc
type: kubernetes.io/tls
data:
  tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURzVENDQXBtZ0F3SUJBZ0lVZFg2RlVLem5vUTZXT0dndmNCOW9jVm1xSHVvd0RRWUpLb1pJaHZjTkFRRUwKQlFBd2ZERUxNQWtHQTFVRUJoTUNWVk14RVRBUEJnTlZCQWdNQ0U1bGR5QlpiM0pyTVJFd0R3WURWUVFIREFoTwpaWGNnV1c5eWF6RVlNQllHQTFVRUNnd1BSWGhoYlhCc1pTQkRiMjF3WVc1NU1Rc3dDUVlEVlFRTERBSkpWREVnCk1CNEdDU3FHU0liM0RRRUpBUllSWVdSdGFXNUFaWGhoYlhCc1pTNWpiMjB3SGhjTk1qUXdOREF6TVRjeE5UVXkKV2hjTk1qVXdOREF6TVRjeE5UVXlXakI4TVFzd0NRWURWUVFHRXdKVlV6RVJNQThHQTFVRUNBd0lUbVYzSUZsdgpjbXN4RVRBUEJnTlZCQWNNQ0U1bGR5QlpiM0pyTVJnd0ZnWURWUVFLREE5RmVHRnRjR3hsSUVOdmJYQmhibmt4CkN6QUpCZ05WQkFzTUFrbFVNU0F3SGdZSktvWklodmNOQVFrQkZoRmhaRzFwYmtCbGVHRnRjR3hsTG1OdmJUQ0MKQVNJd0RRWUpLb1pJaHZjTkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFMbWhELzFqYzlTTG1oVjhtdjE2RDN6aApaTmxYdzlZdndIOWJIaitpV3Y5NDFsbTZqK0NVM3dNT3FpSloyZjJ6ZGtobk5uK3RmUTJhSXFNRlgrdm9zdkhZClNRV2lPMWRTc2EzQTJSZGJQd0V5QzV3bHh1ZUVtZE5vWWZtdHlzSkZSWk9nSkpYU01nelhOMGV3R2FJc1FEazgKZ29vZzV2cFN1OFdJbVNUMDJsUlZRV3FtZklzMSs1WFRhVjB0TXlmWFRFZVoxQXJ0cFZIdk5iekhLS3R4ZFZNQQp5dGx6K3U5Y1JZVzNGeTVoQS91VjFUTXdERzYrWWFVR0FJRzJidTk4TVg3Sk9iVWFtRUJDZnZTM2VWOWQzcXlpCnVualRlcnVlTi9KRmxhc2dpeVc2WmZyOWtobXBsUFl1d0Q5NkUyT2NHK2lzL0FUU1FEV2xJL2ZNYnFsWWJlY0MKQXdFQUFhTXJNQ2t3SndZRFZSMFJCQ0F3SG9JTFpYaGhiWEJzWlM1amIyMkNEM2QzZHk1bGVHRnRjR3hsTG1OdgpiVEFOQmdrcWhraUc5dzBCQVFzRkFBT0NBUUVBaVAwMTI0RFhvWnZnNmxsYW9GSEErWHdwZW1lbGFhaHpyTlVTCk1EbU10MWJ1U2ZKNkNmNkVTTlV1K1pISEI0OFFWd0pKWGxLaTJqVS9acHVvWDlqK0h6TmhnWHhNbEFJc2gyeUMKaTJubUFDOHcrU0hWOTgrRFJESW9YVHNDamRxSWRnSCtWelhzZjFWSkRmeUlhc1JsZGtGNmJDVUdsM0RjUGpkKwpId0VIN0NCZUo5d2lkQmxPRUdveWFDQW12WTJtd1huK216TnRSUXpCYTlWSEo2S1dvWmhCYjN3SXFnSFRTZ21FCnVVRHM2Sm4rcmVlU2FGajVYQTVuMVBKUjgvMXFKMEordk5rZ3IwZ3ZKa1gxT295OVY4YzJDNStLTkU3T3Q2NGQKekJJTTZrY29oMGZXQmIzdWZ1cUwrMU1qNU5HWXVPRFdxdVhSek15TkQwaE9HQWpwRVE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
  tls.key: LS0tLS1CRUdJTiBQUklWQVRFIEtFWS0tLS0tCk1JSUV2UUlCQURBTkJna3Foa2lHOXcwQkFRRUZBQVNDQktjd2dnU2pBZ0VBQW9JQkFRQzVvUS85WTNQVWk1b1YKZkpyOWVnOTg0V1RaVjhQV0w4Qi9XeDQvb2xyL2VOWlp1by9nbE44RERxb2lXZG45czNaSVp6Wi9yWDBObWlLagpCVi9yNkxMeDJFa0ZvanRYVXJHdHdOa1hXejhCTWd1Y0pjYm5oSm5UYUdINXJjckNSVVdUb0NTVjBqSU0xemRICnNCbWlMRUE1UElLS0lPYjZVcnZGaUprazlOcFVWVUZxcG55TE5mdVYwMmxkTFRNbjEweEhtZFFLN2FWUjd6VzgKeHlpcmNYVlRBTXJaYy9ydlhFV0Z0eGN1WVFQN2xkVXpNQXh1dm1HbEJnQ0J0bTd2ZkRGK3lUbTFHcGhBUW43MAp0M2xmWGQ2c29ycDQwM3E3bmpmeVJaV3JJSXNsdW1YNi9aSVpxWlQyTHNBL2VoTmpuQnZvclB3RTBrQTFwU1AzCnpHNnBXRzNuQWdNQkFBRUNnZ0VBTWkweU9Fa1F2MHc1QzBQU1ZXQVFIYTZEWnlpTkhERnVORDY2RDNOZ2E1d0wKUE5mc0drWERmbjBSU2hYRmtnbFhtTHlsZzUrdXBPV2NKVHJIc2VvRnJNL005VVBrREhlaTVaZXlWdGpvVC9kcQpJZndvSnQ2MkFlbytTWkpMczNXc0YvcDZ5VEMzTExka0R2R3dEQ0V2L3dpM05JVXVTazNneWNWaHVCYWppWlhICnplSHZGM0dVRFlFcGNuMzVXcG9FV3hyUkVUSjFXUVN4NFVveVlZeUptSHBDUlNYSklna05jTHU1Y1dmKzY4c0YKME94K05JajJqQ3N3SjNScS9PaGlEMXRMcTdRT1pweDAxM1NLSUIrT0YrNjZTL3F4eDIzeTh2Vm5nRkZQWEVMNwo3YkJzcXA1VXZwVy9XK2RPWVhrNWp6QXl1Ty9uMGZNU3dqNi9CeC9KMlFLQmdRRHNIQU5NTmhkZyt5N1kwa285CkdmSW5MeWdXVFNKNjUzQVNGU3pNTm10eVYwQlh5RGNaK3pnQXpPOWw2eUpnNmJRQ1dQRWc4amZ4R2dFSnBxUncKS3JVTTdhTFREUUFWcHBmRDRaQklmY1hzdmJwc3EwUmptMW9mN0hVKzF1MzJHY3J1YVhMYWxpekhuMUg0UzYyZAowUXZjQVIyYUtEQW9DaWR4SUZuNnhQMTdGUUtCZ1FESlJHR0p3N1FmSG1zdUpvU2dobGJOaVBGdGtSNHQwYzV5CnNBYmRQNVd5TjJTc3RPVVpYNEtaRDZTUE50TXNwWTdaK2tkOHlvZUZzb3Y1d0VqRnpFbDkyV1puZXZvWVVWZHgKWStvVlpuWC9GMUNxZTAzR2NiT1QvQ2ZLU0QzNWFrdXcxN20wMnFDVDNtZnVTOFJWYkJKV1d2K1loelE5dnFJSQpYMlVqclJ5VUN3S0JnQUx3bGxuc2tuM3lvckt3YTV3M0pueTJhWmxkZklCclFVbjRXWVp4WndVVmNRZW14b2pjClIrWTZwd0J0M1ErMzJUWHVSWkpUY2I3ZXhBU0t2cUZtNXJveWUwU0ZkT3JRR0RPb0sxTzd2U3NsY1p6SXhTRTQKWGZibnlzM3RmeWtCU1RXT3VvOWVMMUNNKzBoTUtPMCtIUmV3Szk0dmdlbjl0bUFDTnh5WU4wL0JBb0dCQUxKRApESmoyYTB5OHBuV2p6QWhacy93cmRKcDAwK1FGVmZNaWtaSFl4WCtwckZPRGpQN2lKMHZtSFB4enRLcHdvSXZVCkx3a0tZT283NzlwdlFvVmVvU0VFTXIwb29PWjA5UndMUU1OZmt0Y3pFVkZPRU43WXloTWlYU08reEpWcVhrdnQKWmlBWEcrNmNLRFZaaWpXV21NOC9uZTY4b2JxbVkrRkNqTlFDZWJOdEFvR0FWdFM1SkY3VkhvSHB5RWJjSWY1UgpiR25Ud0RxTnFrMjJwRjR3YkdrYlVXQXFVTWNjc011WFcxZ3pKOFUvSm1lbFJSKzBJb0x5TWdaOXBBaFdLd3hjCmQySXdJSXhXTDI4RlNNREJwZ0VWQmNyZk1vMnBrZHAwZEpzaTBEbm11Q25ocE9LVktycVptcE1IWjRmVVRjRHgKUHpEajB0K0hvRnI4VVR0VEVyZEpSTmc9Ci0tLS0tRU5EIFBSSVZBVEUgS0VZLS0tLS0K
  • Override /etc/hosts for hostname example.com on client to ingress IP
  • Run client image from host with connection to Ingress NGINX LB: docker run --network=host -e ENDPOINT=example.com --rm -it rcng1514/client:java

I'm observing GOAWAY after ~5-10 minutes, shorter periods dependent on NGINX load

rcng6514 avatar Apr 16 '24 13:04 rcng6514

...
Hello Client recieved at time 2024-04-05 08:10:04+00:00
Hello Client recieved at time 2024-04-05 08:10:09+00:00
Exception in thread "main" io.grpc.StatusRuntimeException: UNAVAILABLE: Connection closed after GOAWAY. HTTP/2 error code: NO_ERROR
        at io.grpc.Status.asRuntimeException(Status.java:535)
        at io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:648)
        at io.grpc.examples.helloworld.TestClient.main(TestClient.java:21)

rcng6514 avatar Apr 16 '24 13:04 rcng6514

@rcng6514 thanks

  • None of those timeouts are going to work because currently it seems they are for HTTP only and not for gRPC
  • Try to set grpc-blah-timouts using configuration-snippet https://nginx.org/en/docs/http/ngx_http_grpc_module.html
  • Even without timeouts, I think your client should send a keepalive

longwuyuan avatar Apr 16 '24 18:04 longwuyuan

This is stale, but we won't close it automatically, just bare in mind the maintainers may be busy with other tasks and will reach your issue ASAP. If you have any question or request to prioritize this, please reach #ingress-nginx-dev on Kubernetes Slack.

github-actions[bot] avatar May 17 '24 01:05 github-actions[bot]