kubernetes-ingress
kubernetes-ingress copied to clipboard
A rollout restart, or scale down of haproxy causes 503 connection timeout errors
I'm using helm to run HaProxy ingress with autoscaling enabled (chart version 1.21.1). Whenever an HaProxy pod terminates (because of a scale down event, or a rollout restart), I start seeing 503 backend connection timeout errors for a few seconds.

I tried adding the following example config for graceful shutdown, but that did not resolve the issue:
## Example preStop for graceful shutdown
lifecycle: {}
preStop:
exec:
command: ["/bin/sh", "-c", "kill -USR1 $(pidof haproxy); while killall -0 haproxy; do sleep 1; done"]
Here are some logs from a HaProxy pod that is being scaled down:
Date,Message
"2022-06-17T00:41:48.795Z","[s6-init] making user provided files available at /var/run/s6/etc...exited 0."
"2022-06-17T00:41:48.795Z","[s6-init] ensuring user provided files have correct perms...exited 0."
"2022-06-17T00:41:48.795Z","[fix-attrs.d] applying ownership & permissions fixes..."
"2022-06-17T00:41:48.795Z","[fix-attrs.d] done."
"2022-06-17T00:41:48.795Z","[cont-init.d] executing container initialization scripts..."
"2022-06-17T00:41:48.795Z","[cont-init.d] 01-aux-cfg: executing..."
"2022-06-17T00:41:48.795Z","[cont-init.d] 01-aux-cfg: exited 0."
"2022-06-17T00:41:48.795Z","[cont-init.d] done."
"2022-06-17T00:41:48.795Z","[services.d] starting services"
"2022-06-17T00:41:48.795Z","[services.d] done."
"2022-06-17T00:41:48.795Z","[WARNING] (212) : config : missing timeouts for frontend 'https'."
"2022-06-17T00:41:48.795Z","| While not properly invalid
"2022-06-17T00:41:48.795Z","| with such a configuration. To fix this
"2022-06-17T00:41:48.795Z","| timeouts are set to a non-zero value: 'client'
"2022-06-17T00:41:48.795Z","[WARNING] (212) : config : missing timeouts for frontend 'http'."
"2022-06-17T00:41:48.795Z","| While not properly invalid
"2022-06-17T00:41:48.795Z","| with such a configuration. To fix this
"2022-06-17T00:41:48.795Z","| timeouts are set to a non-zero value: 'client'
"2022-06-17T00:41:48.795Z","[WARNING] (212) : config : missing timeouts for frontend 'healthz'."
"2022-06-17T00:41:48.795Z","| While not properly invalid
"2022-06-17T00:41:48.795Z","| with such a configuration. To fix this
"2022-06-17T00:41:48.795Z","| timeouts are set to a non-zero value: 'client'
"2022-06-17T00:41:48.795Z","[WARNING] (212) : config : missing timeouts for frontend 'stats'."
"2022-06-17T00:41:48.795Z","| While not properly invalid
"2022-06-17T00:41:48.795Z","| with such a configuration. To fix this
"2022-06-17T00:41:48.795Z","| timeouts are set to a non-zero value: 'client'
"2022-06-17T00:41:48.795Z","[WARNING] (212) : Removing incomplete section 'peers localinstance' (no peer named 'haproxy-kubernetes-ingress-79987ccbf5-qs4zv')."
"2022-06-17T00:41:48.795Z","2022/06/17 00:41:41"
"2022-06-17T00:41:48.795Z","_ _ _ ____"
"2022-06-17T00:41:48.795Z","| | | | / \ | _ \ _ __ _____ ___ _"
"2022-06-17T00:41:48.795Z","| |_| | / _ \ | |_) | '__/ _ \ \/ / | | |"
"2022-06-17T00:41:48.795Z","| _ |/ ___ \| __/| | | (_) > <| |_| |"
"2022-06-17T00:41:48.796Z","|_| |_/_/ \_\_| |_| \___/_/\_\\__
"2022-06-17T00:41:48.796Z","_ __ _ |___/ ___ ____"
"2022-06-17T00:41:48.796Z","| |/ / _| |__ ___ _ __ _ __ ___| |_ ___ ___ |_ _/ ___|"
"2022-06-17T00:41:48.796Z","| ' / | | | '_ \ / _ \ '__| '_ \ / _ \ __/ _ \/ __| | | |"
"2022-06-17T00:41:48.796Z","| . \ |_| | |_) | __/ | | | | | __/ || __/\__ \ | | |___"
"2022-06-17T00:41:48.796Z","|_|\_\__
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 HAProxy Ingress Controller v1.7.9 6462c78"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Build from: https://github.com/haproxytech/kubernetes-ingress"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Build date: 2022-04-12T09:39:37"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 ConfigMap: haproxy/haproxy-kubernetes-ingress"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Ingress class: haproxy"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Empty Ingress class: false"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Publish service:"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Default backend service: haproxy/haproxy-kubernetes-ingress-default-backend"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Default ssl certificate: haproxy/eu-west-1-honeydew-epcloudops-com-tls"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Frontend HTTP listening on: 0.0.0.0:80"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Frontend HTTPS listening on: 0.0.0.0:443"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Controller sync period: 5s"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 Running on haproxy-kubernetes-ingress-79987ccbf5-qs4zv"
"2022-06-17T00:41:48.796Z","[NOTICE] (212) : New worker #1 (241) forked"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 haproxy.go:36 Running with HAProxy version 2.4.15-7782e23 2022/03/14 - https://haproxy.org/"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 haproxy.go:50 Starting HAProxy with /etc/haproxy/haproxy.cfg"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 controller.go:116 Running on Kubernetes version: v1.21.12-eks-a64ea69 linux/amd64"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 INFO crmanager.go:75 Global CR defined in API core.haproxy.org"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 INFO crmanager.go:75 Defaults CR defined in API core.haproxy.org"
"2022-06-17T00:41:48.796Z","2022/06/17 00:41:41 INFO crmanager.go:75 Backend CR defined in API core.haproxy.org"
"2022-06-17T00:41:50.797Z","2022/06/17 00:41:49 INFO monitor.go:260 Auxiliary HAProxy config '/etc/haproxy/haproxy-aux.cfg' updated"
"2022-06-17T00:41:51.797Z","[WARNING] (212) : Exiting Master process..."
"2022-06-17T00:41:51.797Z","2022/06/17 00:41:51 INFO controller.go:202 HAProxy restarted"
"2022-06-17T00:41:51.797Z","[NOTICE] (212) : haproxy version is 2.4.15-7782e23"
"2022-06-17T00:41:51.797Z","[ALERT] (212) : Current worker #1 (241) exited with code 143 (Terminated)"
"2022-06-17T00:41:51.798Z","[WARNING] (212) : All workers exited. Exiting... (0)"
"2022-06-17T00:41:51.798Z","[WARNING] (264) : config: Can't get version of the global server state file '/var/state/haproxy/global'."
"2022-06-17T00:41:52.798Z","[NOTICE] (264) : New worker #1 (267) forked"
I have same issue with haproxy ingress controller.
Here's some more info on how I'm installing the HaProxy kubernetes-ingress chart. I'm wondering if there's something I could configure, to allow for pod termination without downtime.
chart: "kubernetes-ingress" chart version: "1.21.1" repository: https://haproxytech.github.io/helm-charts namespace: haproxy values:
# Copyright 2019 HAProxy Technologies LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Default values for kubernetes-ingress Chart for HAProxy Ingress Controller
## ref: https://github.com/haproxytech/kubernetes-ingress/tree/master/documentation
podSecurityPolicy:
annotations: {}
enabled: false
## Enable RBAC Authorization
## ref: https://kubernetes.io/docs/reference/access-authn-authz/rbac/
rbac:
create: true
## Configure Service Account
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
serviceAccount:
create: true
name:
## Controller default values
controller:
name: controller
image:
repository: haproxytech/kubernetes-ingress # can be changed to use CE or EE Controller images
tag: "{{ .Chart.AppVersion }}"
pullPolicy: IfNotPresent
## Deployment or DaemonSet pod mode
## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/
## ref: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
kind: Deployment # can be 'Deployment' or 'DaemonSet'
replicaCount: null
## Running container without root privileges
unprivileged: false
## Pod termination grace period
terminationGracePeriodSeconds: 60
## Private Registry configuration
imageCredentials:
registry: null
username: null
password: null
existingImagePullSecret: null
## Controller Container listener port configuration
containerPort:
http: 80
https: 443
stat: 1024
## Ingress Class used for ingress.class annotation in multi-ingress environments
ingressClass: haproxy # typically "haproxy" or null to receive all events
## Additional labels to add to the deployment or daemonset metadata
# extraLabels: {}
## Additional labels to add to the pod container metadata
# podLabels: {}
## Additional annotations to add to the pod container metadata
podAnnotations:
# Setting the source: haproxy configures Datadog to parse the logs
# The exclude_success_calls regex prevents 2xx and 3xx traffic logs from being sent to Datadog
ad.datadoghq.com/kubernetes-ingress-controller.logs: |-
[{
"source": "haproxy",
"service": "ingress",
"log_processing_rules": [{
"type": "exclude_at_match",
"name": "exclude_success_calls",
"pattern" : "\\d+\\/\\d+\\/\\d+\\/\\d+\\/\\d+ [23]\\d\\d"
}]
}]
## Ingress TLS secret, if it is enabled and secret is null then controller will use auto-generated secret, otherwise
## secret needs to contain name of the Secret object which has been created manually
defaultTLSSecret:
enabled: true
secretNamespace: "haproxy"
secret: "haproxy-tls"
## Compute Resources for controller container
resources:
limits:
cpu: 200m
memory: 384Mi
requests:
cpu: 100m
memory: 192Mi
## Horizontal Pod Scaler
## Only to be used with Deployment kind
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 20
targetCPUUtilizationPercentage: 50
# targetMemoryUtilizationPercentage: 80
## Pod Disruption Budget
## Only to be used with Deployment kind
PodDisruptionBudget:
enable: true
maxUnavailable: 50%
# minAvailable: 1
## Pod Node assignment
# nodeSelector: {}
## Node Taints and Tolerations for pod-node cheduling through attraction/repelling
# tolerations: []
## Node Affinity for pod-node scheduling constraints
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- kubernetes-ingress
topologyKey: kubernetes.io/hostname
## Topology spread constraints (only used in kind: Deployment)
# topologySpreadConstraints: []
## Pod DNS Config
# dnsConfig: {}
## Pod DNS Policy
## Change this to ClusterFirstWithHostNet in case you have useHostNetwork set to true
dnsPolicy: ClusterFirst
## Additional command line arguments to pass to Controller
# extraArgs: []
## Custom configuration for Controller
config:
rate-limit: "ON"
ssl-redirect: "true"
## Controller Logging configuration
logging:
## Controller logging level
## This only relevant to Controller logs
level: info
## HAProxy traffic logs
traffic:
address: "stdout"
format: "raw"
facility: "daemon"
level: "info"
## Mirrors the address of the service's endpoints to the
## load-balancer status of all Ingress objects it satisfies.
publishService:
enabled: false
##
## Override of the publish service
## Must be <namespace>/<service_name>
pathOverride: ""
## Controller Service configuration
## ref: https://kubernetes.io/docs/concepts/services-networking/service/
service:
enabled: true # set to false when controller.kind is 'DaemonSet' and controller.daemonset.useHostPorts is true
type: LoadBalancer # can be 'NodePort' or 'LoadBalancer'
## Service annotations
## ref: https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
## Service labels
# labels: {}
## Health check node port
healthCheckNodePort: 0
## Service nodePorts to use for http, https and stat
## ref: https://kubernetes.io/docs/concepts/services-networking/service/
## If empty, random ports will be used
nodePorts: {}
# http: 31080
# https: 31443
# stat: 31024
## Service ports to use for http, https and stat
## ref: https://kubernetes.io/docs/concepts/services-networking/service/
ports:
http: 80
https: 443
stat: 1024
## The controller service ports for http, https and stat can be disabled by
## setting below to false - this could be useful when only deploying haproxy
## as a TCP loadbalancer
## Note: At least one port (http, https, stat or from tcpPorts) has to be enabled
enablePorts:
http: true
https: true
stat: true
## Target port mappings for http, https and stat
targetPorts:
http: http
https: https
stat: stat
## Additional tcp ports to expose
## This is especially useful for TCP services:
# tcpPorts: []
## Set external traffic policy
## Default is "Cluster", setting it to "Local" preserves source IP
externalTrafficPolicy: "Local"
## Expose service via external IPs that route to one or more cluster nodes
externalIPs: []
## LoadBalancer IP
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#loadbalancer
loadBalancerIP: ""
## Source IP ranges permitted to access Network Load Balancer
# ref: https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/
loadBalancerSourceRanges: [1.2.3.4/32]
## Service ClusterIP
# clusterIP: ""
## Service session affinity
# sessionAffinity: ""
## Controller DaemonSet configuration
## ref: https://kubernetes.io/docs/concepts/workloads/controllers/daemonset/
daemonset:
useHostNetwork: false # also modify dnsPolicy accordingly
useHostPort: false
hostPorts:
http: 80
https: 443
stat: 1024
## Controller deployment strategy definition
## ref: https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#strategy
strategy: {}
# rollingUpdate:
# maxSurge: 25%
# maxUnavailable: 25%
# type: RollingUpdate
## Controller Pod PriorityClass
## ref: https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass
priorityClassName: ""
## Controller container lifecycle handlers
# lifecycle: {}
## Example preStop for graceful shutdown
# preStop:
# exec:
# command: ["/bin/sh", "-c", "kill -USR1 $(pidof haproxy); while killall -0 haproxy; do sleep 1; done"]
## Set additional environment variables
# extraEnvs: []
## Add additional containers
# extraContainers: []
## Additional volumeMounts to the controller main container
# extraVolumeMounts: []
## Additional volumes to the controller pod
# extraVolumes: []
## ServiceMonitor
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md
serviceMonitor:
## Toggle the ServiceMonitor, true if you have Prometheus Operator installed and configured
enabled: false
## Specify the labels to add to the ServiceMonitors to be selected for target discovery
extraLabels: {}
## Specify the endpoints
## ref: https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/design.md#servicemonitor
endpoints:
- port: stat
path: /metrics
scheme: http
## Default 404 backend
defaultBackend:
enabled: true
name: default-backend
replicaCount: 2
image:
repository: k8s.gcr.io/defaultbackend-amd64
tag: 1.5
pullPolicy: IfNotPresent
runAsUser: 65534
## Compute Resources
resources:
# limits:
# cpu: 10m
# memory: 16Mi
requests:
cpu: 10m
memory: 16Mi
## Horizontal Pod Scaler
## Only to be used with Deployment kind
autoscaling:
enabled: false
minReplicas: 1
maxReplicas: 2
targetCPUUtilizationPercentage: 80
# targetMemoryUtilizationPercentage: 80
## Listener port configuration
containerPort: 8080
## Pod Node assignment
# nodeSelector: {}
## Node Taints and Tolerations for pod-node cheduling through attraction/repelling
# tolerations: []
## Node Affinity for pod-node scheduling constraints
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- kubernetes-ingress
topologyKey: kubernetes.io/hostname
## Topology spread constraints
# topologySpreadConstraints: []
## Additional labels to add to the pod container metadata
# podLabels: {}
## Additional annotations to add to the pod container metadata
# podAnnotations: {}
service:
## Service ports
port: 8080
## Configure Service Account
serviceAccount:
create: true
## Pod PriorityClass
priorityClassName: ""
## Set additional environment variables
# extraEnvs: []
@pnmatich
you could try with this command. We are using it for our HAProxy (not as Ingress though):
lifecycle:
preStop:
exec:
command: ["/bin/sh","-c","sleep 10; kill -SIGUSR1 $(pidof haproxy)"]
Tested with Fortio; requests have 100% success rate.
Make sure:
- Sleep time is as long as your http-keep-alive timeout
- terminationGracePeriodSeconds is >= sleep time + haproxy SIGUSR1 termination time
edit: sorry, forgot "-c" argument in command. Fixed it...
I tried myself with our ingress controller setup; I experience the same issues.
Neither
command: ["/bin/sh","-c","sleep 10; kill -SIGUSR1 $(pidof haproxy)"]
nor
command: ["/bin/sh","-c","s6-svc -1 /var/run/s6/services/haproxy"]
works to enable restarts without connection drops.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Please do not close this issue, stale.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Bumping since it's still an issue and experiencing it in my environment, too.
We're seeing the haproxy process exiting from a SIGKILL instead of gracefully terminating from a SIGTERM during a rollout as expected.
Best guess is that it's an issue related to processes not forwarding signals to child processes correctly, so /usr/local/sbin/haproxy never has a chance to react to the SIGTERM 🤷♂️
❯ kubectl exec -n haproxy deploy/haproxy-kubernetes-ingress -- ps -ef
PID USER TIME COMMAND
1 haproxy 0:00 s6-svscan -t0 /var/run/s6/services
39 haproxy 0:00 s6-supervise s6-fdholderd
208 haproxy 0:00 s6-supervise haproxy
209 haproxy 0:00 s6-supervise ingress-controller
212 haproxy 0:01 /haproxy-ingress-controller --with-s6-overlay --default-ss
261 haproxy 0:00 /usr/local/sbin/haproxy -x /var/run/haproxy-runtime-api.so
267 haproxy 0:05 /usr/local/sbin/haproxy -W -db -m 10364 -f /etc/haproxy/ha
287 haproxy 0:00 ps -ef
Same thing here, it doesn't seem to be shutting down gracefully.
I can confirm. We have identified the culprit and the fix is in the queue, being reviewed.
Hey @dkorunic just wanted to bump this one more time since we're eagerly awaiting this release. Do you have an ETA for when we could expect this? Thanks! 🙌
@evandam Fix has been commited in https://github.com/haproxytech/kubernetes-ingress/commit/6afd804b0410154daf601fcf3ca5969623aeef89 and the release is incoming, I'll check the exact time frame we expect it to be published.
@evandam It will happen in the next hour or so (it's already in progress) and as soon as IC binary release and IC image has been released, I'll update Helm Chart accordingly.
Incoming in Helm Chart 1.23.2.
Thanks for pushing this one over the finish line @dkorunic!