ingress-nginx icon indicating copy to clipboard operation
ingress-nginx copied to clipboard

Config keep-alive does not reflect the real config of nginx: keepalive_timeout

Open awx-fuyuanchu opened this issue 2 years ago • 4 comments

What happened:

Requests to ingress encountered connection refused errors occasionally.

As per the current settings of the nginx ingress, the keepalive_timeout is 75s. So the nginx will close the connection if it is idle for 75 seconds. But occasionally, there is a chance the client sends the request while the connection is closing.

The parameter keepalive_timeout supports setting a different value for the header Keep-Alive, so the client could close the connection before the server does.

What you expected to happen:

We could set both the server side and the client side keepalive timeout independently

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

-------------------------------------------------------------------------------
NGINX Ingress controller
  Release:       v1.2.1
  Build:         08848d69e0c83992c89da18e70ea708752f21d7a
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.19.10

-------------------------------------------------------------------------------

Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.12-gke.2300", GitCommit:"e55564cf3a1384026a54920174977659c8c56a50", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:51Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: GCP
  • OS (e.g. from /etc/os-release):
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=8339df9b3f28bad265f645cad9d96530c90d8675
GOOGLE_CRASH_ID=Lakitu
VERSION=93
VERSION_ID=93
BUILD_ID=16623.227.10
  • Kernel (e.g. uname -a): Linux n2-node-pool-db1b7687-992t 5.10.133+ #1 SMP Fri Jul 29 08:49:27 UTC 2022 x86_64 Intel(R) Xeon(R) CPU @ 2.80GHz GenuineIntel GNU/Linux

  • Install tools: terraform

  • Basic cluster related info:

    • kubectl version
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.12-gke.2300", GitCommit:"e55564cf3a1384026a54920174977659c8c56a50", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:51Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}
  • kubectl get nodes -o wide

  • How was the ingress-nginx-controller installed:

    • If helm was used then please show output of helm ls -A | grep -i ingress
> helm ls -A | grep -i ingress
W1107 18:41:55.376720   14364 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
ingress-nginx                  	infra    	18      	2022-09-14 02:12:09.778668527 +0000 UTC	deployed	ingress-nginx-4.1.4            	1.2.1
  • If helm was used then please show output of helm -n <ingresscontrollernamepspace> get values <helmreleasename>
> helm -n infra get values ingress-nginx
USER-SUPPLIED VALUES:
controller:
  autoscaling:
    enabled: true
    minReplicas: 3
    targetCPUUtilizationPercentage: 75
    targetMemoryUtilizationPercentage: 75
  config:
    enable-underscores-in-headers: "true"
    log-format-escape-json: "true"
    log-format-upstream: '{"method": "$request_method", "x_forward_for": "$http_x_forwarded_for",
      "user_agent": "$http_user_agent", "server_port": "$server_port", "ingress_host":
      "$host", "uri": "$uri", "scheme": "$scheme", "response_status": "$status", "upstream_status":
      "$upstream_status", "remote_addr": "$remote_addr", "upstream_addr": "$upstream_addr",
      "server_addr": "$server_addr", "request_length": "$request_length", "response_length":
      "$upstream_response_length", "request_time": "$request_time", "upstream_response_time":
      "$upstream_response_time", "upstream_connect_time": "$upstream_connect_time",
      "upstream_header_time": "$upstream_header_time", "request_id": "$request_id",
      "protocol": "$server_protocol", "time": "$time_iso8601", "http_referrer": "$http_referer",
      "remote_user": "$remote_user", "request_query": "$args", "body_bytes_sent":
      "$body_bytes_sent", "upstream_bytes_sent": "$upstream_bytes_sent", "service_name":
      "$service_name", "ingress_name": "$ingress_name", "ingress_namespace": "$namespace",
      "traceId": "$http_x_b3_traceid"}'
    max-worker-connections: "65536"
    proxy-body-size: 20m
    ssl-redirect: "false"
    worker-cpu-affinity: auto
  electionID: ingress-controller-leader-nginx
  image:
    pullPolicy: Always
  ingressClass: nginx
  ingressClassResource:
    controllerValue: k8s.io/nginx
    name: nginx
  livenessProbe:
    failureThreshold: 2
  metrics:
    enabled: true
  podAnnotations:
    prometheus.io/port: "10254"
    prometheus.io/scrape: "true"
  publishService:
    enabled: true
  readinessProbe:
    failureThreshold: 2
    successThreshold: 2
  replicaCount: 3
  resources:
    limits:
      cpu: 1
      memory: 1Gi
    requests:
      cpu: 100m
      memory: 250Mi
  service:
    annotations:
      cloud.google.com/load-balancer-type: internal
      networking.gke.io/internal-load-balancer-allow-global-access: "true"
    nodePorts:
      http: 32080
      https: 32443
podSecurityPolicy:
  enabled: true
rbac:
  create: true

How to reproduce this issue: Hard to reproduce

Anything else we need to know:

awx-fuyuanchu avatar Nov 07 '22 10:11 awx-fuyuanchu

@FuyuanChu: This issue is currently awaiting triage.

If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot avatar Nov 07 '22 10:11 k8s-ci-robot

/remove-kind bug This looks like a feature request

longwuyuan avatar Nov 14 '22 13:11 longwuyuan

We could create a custom template of nginx.conf as a workaround for this scenario. But better to have this feature implemented natively.

awx-fuyuanchu avatar Nov 28 '22 01:11 awx-fuyuanchu

Hi @FuyuanChu,

We faced the same issue recently - (even) explicitly specified keep-alive ConfigMap option doesn't make Ingress NGINX Controller responding with Keep-alive header (but response still contains Connection: keep-alive header). To workaround this issue we used server-snippet ConfigMap option (refer to https://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_timeout: Context: http, server, location, http-snippet ConfigMap option doesn't work because Ingress NGINX Controller inserts keepalive_timeout NGINX option into that section itself and if we use http-snippet then it leads to a conflict) like this:

keep-alive: "300"
server-snippet: "keepalive_timeout 300s 180;

Note that we intentionally set value in Keep-alive HTTP response header (180) less than connection keep-alive interval used at NGINX side (300s) to ensure that client closes connection before NGINX does it. Maybe this is the reason Ingress NGINX Controller doesn't generate full keepalive_timeout NGINX option itself - it just cannot deduce value to use in Keep-alive response header (i.e. it should be a new ConfigMap option, IMHO).

Thank you.

mabrarov avatar Feb 21 '24 08:02 mabrarov