ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

(110: Operation timed out) while reading response header from upstream

Open Basitkhan23 opened this issue 2 years ago • 7 comments

Hi there! I am getting this error in the logs. [error] 117#117: *16706 upstream timed out (110: Operation timed out) while reading response header from upstream.I have tried every possible way to check from where i am getting this exact 60s timeout.

So i am using these headers in my config map (for nginx ingress controller).I have checked it by removing and adding these headers one by one but no change in that timeout.

client-header-timeout: "7200" keep-alive: "300" keep-alive-requests: "100000" keepalive-timeout: "300" proxy-connect-timeout: "7200" proxy-read-timeout: "7200" proxy-send-timeout: "7200" upstream-keepalive-requests: "100000" upstream-keepalive-timeout: "7200"

And i have also tried adding these headers in my ingress resource/rule for that microservice.

nginx.ingress.kubernetes.io/client-body-timeout: "7200" nginx.ingress.kubernetes.io/client-header-timeout: "7200" nginx.ingress.kubernetes.io/client-max-body-size: 5000m nginx.ingress.kubernetes.io/keep-alive: "300" nginx.ingress.kubernetes.io/keepalive-timeout: "300" nginx.ingress.kubernetes.io/large-client-header-buffers: 64 128k nginx.ingress.kubernetes.io/proxy-body-size: 5000m nginx.ingress.kubernetes.io/proxy-buffer-size: 16k nginx.ingress.kubernetes.io/proxy-connect-timeout: "7200" nginx.ingress.kubernetes.io/proxy-next-upstream-timeout: "7200" nginx.ingress.kubernetes.io/proxy-read-timeout: "7200" nginx.ingress.kubernetes.io/proxy-send-timeout: "7200" nginx.ingress.kubernetes.io/rewrite-target: / nginx.ingress.kubernetes.io/send_timeout: "7200"

Nginx ingress controller version:

Release: v1.0.5 Build: 7ce96cbcf668f94a0d1ee0a674e96002948bff6f Repository: https://github.com/kubernetes/ingress-nginx nginx version: nginx/1.19.9

We are using Azure Kubernetes cluster and The core dns is resovling all the api/url from an active directory which is deployed in an Azure windows Virtual machine.

Backend API in in dotnet core (sdk:6.0.400 and ASP.NET core Runtime 6.0.8) (all the keepalives and request-timeouts settings are already tested that are defined in the code).

Basitkhan23 avatar Sep 18 '22 22:09 Basitkhan23

@Basitkhan23: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Sep 18 '22 22:09 k8s-ci-robot

Hi @Basitkhan23 Can you check inside the pod for nginx.conf if these are actually getting set for your location

bmv126 avatar Sep 18 '22 22:09 bmv126

Hi @Basitkhan23 Can you check inside the pod for nginx.conf if these are actually getting set for your location

Thank you for your time. Yeah i can see these headers are being reflect inside the nginx.conf file.

http section image

Upstream Section image

Server section image

Basitkhan23 avatar Sep 18 '22 23:09 Basitkhan23

Since the backend api is running over GRPC protocol and as per the documentation "proxy_read_timeout" is being used for both GRPC and HTTP.

But i have also tried this (https://github.com/kubernetes/ingress-nginx/tree/main/docs/examples/grpc#notes-on-using-responserequest-streams) and that also didn't solve the problem

Basitkhan23 avatar Sep 19 '22 02:09 Basitkhan23

/kind support

Hi @Basitkhan23,

I think you could use a sample hello-world grpc app and set backend protocol to GRPC, while making sure that no annotations or configMap options are used, beyond the defaults that ship out of the box. With this kind of environment, If you see timeouts with the hello-world type of app, then you can be sure that there is a default config on the controller that needs to be discussed or fixed.

Otherwise if there is no timeout with the hello-world app, then you can start introducing your annotations and configMap options to the environment, one by one, and find out at which one causes the timeout.

There is no answers to the questions asked in the issue-template so it seems like everyone here has to base all their comments on guesses. Since there is no bug related data here, it may be better to take this discussion to the Kubernetes slack where there are more people with related info a& skills.

longwuyuan avatar Sep 19 '22 03:09 longwuyuan

Hello again, Okay as per suggested i have checked this timeout by excluding all the annotations and values that i have set in the config map and getting same 60s timeout.

Then i added these three headers proxy-connect-timeout: "7200" proxy-read-timeout: "7200" proxy-send-timeout: "7200" in the configmap of the ingress controller which set the headers in this block of nignx.conf but same 60s timeout.

Then i moved on with adding these annotation nginx.ingress.kubernetes.io/proxy-read-timeout: "7200" nginx.ingress.kubernetes.io/proxy-send-timeout: "7200" which sets the header in the specific service location block but again same 60s timeout.

Then i moved on with reducing the time defined for these headers. i tried it with 30s but again same 60s timeout. If i set this to 30s, shouldn't the request be timeout in 30s with an error (connection timeout) but it's not happening in our case

I am changing the values against these specific headers because in my case they do effect the application timeout. I have local running nginx and a sample api behind. if i set 30s the requests get timed out in 30s and if i set 100s then it's timeout in 100s and it works if i set these in location/server block.

Basitkhan23 avatar Sep 20 '22 17:09 Basitkhan23

  • I am on controller v1.3.1 and I have this sample workload deployed using iamge nginx:alpine
% k get ns
NAME                 STATUS   AGE
default              Active   13d
enjoydevops          Active   8d
ingress-nginx        Active   12d
kube-node-lease      Active   13d
kube-public          Active   13d
kube-system          Active   13d
local-path-storage   Active   13d
metallb-system       Active   12d
[~] 
% create.www.dev.mydomain.com.sh 
namespace/mydomain created
deployment.apps/www created
service/www exposed
ingress.networking.k8s.io/www created
[~] 
% k get ns
NAME                 STATUS   AGE
default              Active   13d
enjoydevops          Active   8d
ingress-nginx        Active   12d
kube-node-lease      Active   13d
kube-public          Active   13d
kube-system          Active   13d
local-path-storage   Active   13d
metallb-system       Active   12d
mydomain             Active   7s
[~] 
% k -n mydomain get all,ing
NAME                       READY   STATUS    RESTARTS   AGE
pod/www-7b5cc4d966-6hxmh   1/1     Running   0          30s

NAME          TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
service/www   ClusterIP   10.96.142.92   <none>        80/TCP    30s

NAME                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/www   1/1     1            1           31s

NAME                             DESIRED   CURRENT   READY   AGE
replicaset.apps/www-7b5cc4d966   1         1         1       31s

NAME                            CLASS   HOSTS                  ADDRESS   PORTS   AGE
ingress.networking.k8s.io/www   nginx   www.dev.mydomain.com             80      30s
% helm ls -A            
NAME            NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                   APP VERSION
ingress-nginx   ingress-nginx   1               2022-09-08 15:57:55.587871553 +0530 IST deployed        ingress-nginx-4.2.5     1.3.1      
[~] 
% k -n ingress-nginx logs ingress-nginx-controller-7bf78659d-tctjg 
-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.3.1
  Build:         92534fa2ae799b502882c8684db13a25cde68155
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

-------------------------------------------------------------------------------

W0919 02:47:17.638233      13 client_config.go:617] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
I0919 02:47:17.640284      13 main.go:209] "Creating API client" host="https://10.96.0.1:443"
W0919 02:47:48.810541      13 main.go:250] Initial connection to the Kubernetes API server was retried 1 times.
I0919 02:47:48.810571      13 main.go:253] "Running in Kubernetes cluster" major="1" minor="25" git="v1.25.0" state="clean" commit="a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2" platform="linux/amd64"
I0919 02:47:48.912851      13 main.go:104] "SSL fake certificate created" file="/etc/ingress-controller/ssl/default-fake-certificate.pem"
I0919 02:47:48.929059      13 ssl.go:533] "loading tls certificate" path="/usr/local/certificates/cert" key="/usr/local/certificates/key"
I0919 02:47:48.943759      13 nginx.go:260] "Starting NGINX Ingress controller"
I0919 02:47:48.948179      13 event.go:285] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"ingress-nginx", Name:"ingress-nginx-controller", UID:"9939f9e4-6bef-4fa0-97f7-0b00c916250f", APIVersion:"v1", ResourceVersion:"618274", FieldPath:""}): type: 'Normal' reason: 'CREATE' ConfigMap ingress-nginx/ingress-nginx-controller
I0919 02:47:50.051646      13 store.go:430] "Found valid IngressClass" ingress="enjoydevops/www" ingressclass="nginx"
I0919 02:47:50.051849      13 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"enjoydevops", Name:"www", UID:"3a63a651-5198-4d1f-ad31-76fc99071063", APIVersion:"networking.k8s.io/v1", ResourceVersion:"618485", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0919 02:47:50.052962      13 backend_ssl.go:67] "Adding secret to local store" name="enjoydevops/enjoydevops"
I0919 02:47:50.149035      13 nginx.go:303] "Starting NGINX process"
I0919 02:47:50.149099      13 leaderelection.go:248] attempting to acquire leader lease ingress-nginx/ingress-controller-leader...
I0919 02:47:50.149498      13 nginx.go:323] "Starting validation webhook" address=":8443" certPath="/usr/local/certificates/cert" keyPath="/usr/local/certificates/key"
I0919 02:47:50.150425      13 controller.go:168] "Configuration changes detected, backend reload required"
I0919 02:47:50.155489      13 leaderelection.go:258] successfully acquired lease ingress-nginx/ingress-controller-leader
I0919 02:47:50.155561      13 status.go:84] "New leader elected" identity="ingress-nginx-controller-7bf78659d-tctjg"
I0919 02:47:50.272907      13 controller.go:185] "Backend successfully reloaded"
I0919 02:47:50.273005      13 controller.go:196] "Initial sync, sleeping for 1 second"
I0919 02:47:50.273065      13 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-7bf78659d-tctjg", UID:"21eb9aa5-4640-4c1a-a820-5441d282a25a", APIVersion:"v1", ResourceVersion:"671148", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
172.18.0.1 - - [21/Sep/2022:01:08:21 +0000] "HEAD / HTTP/1.1" 308 0 "-" "curl/7.81.0" 88 0.000 [enjoydevops-www-80] [] - - - - 0486193e53521d600bc5018454d3435e
172.18.0.1 - - [21/Sep/2022:01:08:32 +0000] "GET / HTTP/1.1" 308 164 "-" "curl/7.81.0" 87 0.000 [enjoydevops-www-80] [] - - - - 4e29fc6306c77eed7af54ea7c42e007e
172.18.0.1 - - [21/Sep/2022:01:08:36 +0000] "GET / HTTP/1.1" 308 164 "-" "curl/7.81.0" 87 0.000 [enjoydevops-www-80] [] - - - - 246ed94cddcde4cd0f8ec0d26dca374d
W0921 09:53:58.297129      13 controller.go:1112] Service "mydomain/www" does not have any active Endpoint.
I0921 09:53:58.355004      13 admission.go:149] processed ingress via admission controller {testedIngressLength:2 testedIngressTime:0.058s renderingIngressLength:2 renderingIngressTime:0s admissionTime:22.2kBs testedConfigurationSize:0.058}
I0921 09:53:58.355425      13 main.go:100] "successfully validated configuration, accepting" ingress="mydomain/www"
I0921 09:53:58.365637      13 store.go:430] "Found valid IngressClass" ingress="mydomain/www" ingressclass="nginx"
I0921 09:53:58.366174      13 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mydomain", Name:"www", UID:"8e808615-11b9-4093-aa79-02649ceb5c47", APIVersion:"networking.k8s.io/v1", ResourceVersion:"798568", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync
I0921 09:54:01.509223      13 controller.go:168] "Configuration changes detected, backend reload required"
I0921 09:54:01.622974      13 controller.go:185] "Backend successfully reloaded"
I0921 09:54:01.623626      13 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-7bf78659d-tctjg", UID:"21eb9aa5-4640-4c1a-a820-5441d282a25a", APIVersion:"v1", ResourceVersion:"671148", FieldPath:""}): type: 'Normal' reason: 'RELOAD' NGINX reload triggered due to a change in configuration
I0921 09:54:55.817232      13 status.go:299] "updating Ingress status" namespace="mydomain" ingress="www" currentValue=[] newValue=[{IP:172.18.0.2 Hostname: Ports:[]}]
I0921 09:54:55.827042      13 event.go:285] Event(v1.ObjectReference{Kind:"Ingress", Namespace:"mydomain", Name:"www", UID:"8e808615-11b9-4093-aa79-02649ceb5c47", APIVersion:"networking.k8s.io/v1", ResourceVersion:"798654", FieldPath:""}): type: 'Normal' reason: 'Sync' Scheduled for sync

So out of the box, I don't see any timeout message in logs for a simple app.

So next step for me is to replicate your config, but this data above convinces me that there is no problem with the controller to be solved, for a simple use case.

Hence the help needed form you is to describe the problem in terms of ;

  • a generic application and not your custom app because troubleshooting your own app is out of scope
  • set of configmap settings
  • set of annotations
  • exact precise complete example of a curl request with custom headers if any etc etc

Someone can try to copy/paste from your description or even better if you can write a step-by-step instruction document, and so reduce the time spent by a developer to reproduce the problem, in case there is a bug/problem to be solved here.

There is several configMap settings and several annotations so its hard for someone other than you to spend time replicating your environment close enough to be useful and valid.

Also, there are very few resources to track support issues on github. You would have more people with the knowledge & skills in the kubernetes slack to look at your messages on this problem.

longwuyuan avatar Sep 21 '22 10:09 longwuyuan

Hello.

Thanks a lot for helping.

Found the problem. May be i have missed something but seems these

proxy-read-timeout: "7200"
proxy-send-timeout: "7200"

headers doesn't effect the timeouts for the backend GRPC communication. I had to add the "server-snippet" to add these grpc_read_timeout 120s; grpc_send_timeout 120s; client_body_timeout 120s;

I am closing this issue. Thanks Once again

Basitkhan23 avatar Sep 26 '22 12:09 Basitkhan23