ingress-nginx
ingress-nginx copied to clipboard
Config keep-alive does not reflect the real config of nginx: keepalive_timeout
What happened:
Requests to ingress encountered connection refused errors occasionally.
As per the current settings of the nginx ingress, the keepalive_timeout is 75s. So the nginx will close the connection if it is idle for 75 seconds. But occasionally, there is a chance the client sends the request while the connection is closing.
The parameter keepalive_timeout
supports setting a different value for the header Keep-Alive
, so the client could close the connection before the server does.
What you expected to happen:
We could set both the server side and the client side keepalive timeout independently
NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.2.1
Build: 08848d69e0c83992c89da18e70ea708752f21d7a
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.10
-------------------------------------------------------------------------------
Kubernetes version (use kubectl version
):
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.12-gke.2300", GitCommit:"e55564cf3a1384026a54920174977659c8c56a50", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:51Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}
Environment:
- Cloud provider or hardware configuration: GCP
- OS (e.g. from /etc/os-release):
NAME="Container-Optimized OS"
ID=cos
PRETTY_NAME="Container-Optimized OS from Google"
HOME_URL="https://cloud.google.com/container-optimized-os/docs"
BUG_REPORT_URL="https://cloud.google.com/container-optimized-os/docs/resources/support-policy#contact_us"
GOOGLE_METRICS_PRODUCT_ID=26
KERNEL_COMMIT_ID=8339df9b3f28bad265f645cad9d96530c90d8675
GOOGLE_CRASH_ID=Lakitu
VERSION=93
VERSION_ID=93
BUILD_ID=16623.227.10
-
Kernel (e.g.
uname -a
):Linux n2-node-pool-db1b7687-992t 5.10.133+ #1 SMP Fri Jul 29 08:49:27 UTC 2022 x86_64 Intel(R) Xeon(R) CPU @ 2.80GHz GenuineIntel GNU/Linux
-
Install tools: terraform
-
Basic cluster related info:
-
kubectl version
-
Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:59:11Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.12-gke.2300", GitCommit:"e55564cf3a1384026a54920174977659c8c56a50", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:51Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}
-
kubectl get nodes -o wide
-
How was the ingress-nginx-controller installed:
- If helm was used then please show output of
helm ls -A | grep -i ingress
- If helm was used then please show output of
> helm ls -A | grep -i ingress
W1107 18:41:55.376720 14364 gcp.go:119] WARNING: the gcp auth plugin is deprecated in v1.22+, unavailable in v1.26+; use gcloud instead.
To learn more, consult https://cloud.google.com/blog/products/containers-kubernetes/kubectl-auth-changes-in-gke
ingress-nginx infra 18 2022-09-14 02:12:09.778668527 +0000 UTC deployed ingress-nginx-4.1.4 1.2.1
- If helm was used then please show output of
helm -n <ingresscontrollernamepspace> get values <helmreleasename>
> helm -n infra get values ingress-nginx
USER-SUPPLIED VALUES:
controller:
autoscaling:
enabled: true
minReplicas: 3
targetCPUUtilizationPercentage: 75
targetMemoryUtilizationPercentage: 75
config:
enable-underscores-in-headers: "true"
log-format-escape-json: "true"
log-format-upstream: '{"method": "$request_method", "x_forward_for": "$http_x_forwarded_for",
"user_agent": "$http_user_agent", "server_port": "$server_port", "ingress_host":
"$host", "uri": "$uri", "scheme": "$scheme", "response_status": "$status", "upstream_status":
"$upstream_status", "remote_addr": "$remote_addr", "upstream_addr": "$upstream_addr",
"server_addr": "$server_addr", "request_length": "$request_length", "response_length":
"$upstream_response_length", "request_time": "$request_time", "upstream_response_time":
"$upstream_response_time", "upstream_connect_time": "$upstream_connect_time",
"upstream_header_time": "$upstream_header_time", "request_id": "$request_id",
"protocol": "$server_protocol", "time": "$time_iso8601", "http_referrer": "$http_referer",
"remote_user": "$remote_user", "request_query": "$args", "body_bytes_sent":
"$body_bytes_sent", "upstream_bytes_sent": "$upstream_bytes_sent", "service_name":
"$service_name", "ingress_name": "$ingress_name", "ingress_namespace": "$namespace",
"traceId": "$http_x_b3_traceid"}'
max-worker-connections: "65536"
proxy-body-size: 20m
ssl-redirect: "false"
worker-cpu-affinity: auto
electionID: ingress-controller-leader-nginx
image:
pullPolicy: Always
ingressClass: nginx
ingressClassResource:
controllerValue: k8s.io/nginx
name: nginx
livenessProbe:
failureThreshold: 2
metrics:
enabled: true
podAnnotations:
prometheus.io/port: "10254"
prometheus.io/scrape: "true"
publishService:
enabled: true
readinessProbe:
failureThreshold: 2
successThreshold: 2
replicaCount: 3
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 100m
memory: 250Mi
service:
annotations:
cloud.google.com/load-balancer-type: internal
networking.gke.io/internal-load-balancer-allow-global-access: "true"
nodePorts:
http: 32080
https: 32443
podSecurityPolicy:
enabled: true
rbac:
create: true
How to reproduce this issue: Hard to reproduce
Anything else we need to know:
@FuyuanChu: This issue is currently awaiting triage.
If Ingress contributors determines this is a relevant issue, they will accept it by applying the triage/accepted
label and provide further guidance.
The triage/accepted
label can be added by org members by writing /triage accepted
in a comment.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/remove-kind bug This looks like a feature request
We could create a custom template of nginx.conf
as a workaround for this scenario. But better to have this feature implemented natively.
Hi @FuyuanChu,
We faced the same issue recently - (even) explicitly specified keep-alive ConfigMap option doesn't make Ingress NGINX Controller responding with Keep-alive
header (but response still contains Connection: keep-alive
header). To workaround this issue we used server-snippet ConfigMap option (refer to https://nginx.org/en/docs/http/ngx_http_core_module.html#keepalive_timeout: Context: http, server, location, http-snippet ConfigMap option doesn't work because Ingress NGINX Controller inserts keepalive_timeout
NGINX option into that section itself and if we use http-snippet
then it leads to a conflict) like this:
keep-alive: "300"
server-snippet: "keepalive_timeout 300s 180;
Note that we intentionally set value in Keep-alive
HTTP response header (180
) less than connection keep-alive interval used at NGINX side (300s
) to ensure that client closes connection before NGINX does it. Maybe this is the reason Ingress NGINX Controller doesn't generate full keepalive_timeout
NGINX option itself - it just cannot deduce value to use in Keep-alive
response header (i.e. it should be a new ConfigMap option, IMHO).
Thank you.