kind
kind copied to clipboard
limit loadbalancer max sockets
I'm currently dealing with this issue with haproxy: https://github.com/docker-library/haproxy/issues/194 It also impacts the kindest/haproxy docker images (which is where I came across it).
Setting a global maxconn
in the haproxy.cfg file would fix the impact of the issue, though not the source issue which is somewhere between haproxy and docker AFAIK.
I was wondering if it would be acceptable to set that value in the haproxy config, and if so what value would be a good default? For fixing the above issue I think it can be arbitrarily high (as the issue seems to be the value being unset rather than what level it's set to).
reading the linked issue it seems we have 2 options:
- set maxconn
- run the LB container setting the ulimits to a conservative number
I'm inclined towards 2, WDYT @BenTheElder ?
run the LB container setting the ulimits to a conservative number
This is what the solution for Cluster API does - https://github.com/kubernetes-sigs/cluster-api/pull/7344 - it works, but getting the number right is difficult.
I'm not sure how many other places run kindest/haproxy independently, but those would need to be set the ulimits too whereas setting the maxconn in the config file would solve it for all consumers
yeah, but we got bitten for the ulimit things multiple times , setting the ulimit is common in all "stable" distros like RHEL and SLES, however, more "edge" distros like Fedora, Arch, ... set it to NoLimit,
https://github.com/kubernetes-sigs/kind/pull/760#issuecomment-519286535
My main pro of this approach is that will also remove the dependency on haproxy in case we want to switch the loadbalancer, I don't know, 1048576 or 65536 sounds good enough for Kind users
I'm fine with one or the other though
Sorry for the delayed response -- I think we should set a reasonable ulimit on the container, we have only the api server load balancing workload and don't support user based workloads other than their connections to the api server. We can pick a pretty reasonable upper bound for concurrent connections supported +1 for the config, and setting ulimit is more comprehensive.
We don't formally support reusing this image without importing kind or closely matching behavior, any future revisions can make any number of breaking changes freely.
At the moment the fact that we even use haproxy at all is an under designed internal detail that just happens to be working OK at the moment. We once used nginx, and HA probably deserves an overhaul at some point (pending more demand, kubernetes mostly doesn't have staffing to focus on testing HA at the moment).
CAPD is a bit of an odd duck here 😅
CAPD is a bit of an odd duck here 😅
Fair :D It does copy-past the folder into a third_party directory (and the version used there is many versions out of date)
An odd duck we love :-)
Yeah, I think copying in is the safest approach, so CAPD can continue to update when ready.
Since both projects are kubernetes org owned and identical license, at least there's limited copy-paste license concerns 🙃
Setting a global maxconn in the haproxy.cfg file would fix the impact of the issue, though not the source issue which is somewhere between haproxy and docker AFAIK.
From what I understand, the root cause is that HAproxy allocates memory for each possible concurrent connection, up to the maximum defined by the maxconn
field in its configuration.
Kind's HAproxy configuration does not set maxconn
, so HAProxy derives it from the file descriptor limit, which the container inherits:
If this value is not set, it will automatically be calculated based on the current file descriptors limit reported by the "ulimit -n" command, possibly reduced to a lower value if a memory limit is enforced, based on the buffer size, memory allocated to compression, SSL cache size, and use or not of SSL and the associated maxsslconn (which can also be automatic). -- https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#maxconn
Here's a demonstration using the latest image:
❯ docker run --ulimit nofile=1000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 453.
❯ docker run --ulimit nofile=10000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 4953.
❯ docker run --ulimit nofile=100000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 49953.
❯ docker run --ulimit nofile=1000000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 499953.
Please see the gist for a demonstration of how the kindest/haproxy memory container memory usage scales with the file descriptor limit: https://gist.github.com/dlipovetsky/23443bef17371a56acd8cf0579e3f6b4
There are two to ways to control the memory usage: limit maxconn
in the configuration, or limit per-process memory usage using the -m
flag (the flag is explained in http://docs.haproxy.org/2.2/management.html#3).
To follow up on my previous comment: It seems reasonable to limit either the maximum connections, or impose a memory limit, on HAProxy. Either of these changes would be explicit and clear. Changing the file descriptor limit is, at beast, an implicit, indirect way to change the number of maximum connections.
Both my coworker (running Fedora) and I (running Arch) are still experiencing this issue when we attempt to create a cluster, even though I'm using the latest release version which mentions the fix for this being included. Looks like the image version is hard coded into kind
and hasn't been updated to reflect that change, perhaps? (Edit: manually retagged the newest image (v20230330-2f738c2) as the older image that's inside the code (v20230227-d46f45b6) and confirmed that it works now, the hardcoded image version definitely needs updated!)
https://github.com/kubernetes-sigs/kind/blob/2f7221788e2cc51a6076490f2511d572cb6659d0/pkg/cluster/internal/loadbalancer/const.go#L20
> kind create cluster --config cluster.yaml
Creating cluster "kind" ...
✓ Ensuring node image (kindest/node:v1.26.3) 🖼
✓ Preparing nodes 📦 📦 📦 📦 📦 📦
✗ Configuring the external load balancer ⚖
Deleted nodes: ["kind-external-load-balancer" "kind-worker3" "kind-worker2" "kind-control-plane" "kind-control-plane3"
"kind-control-plane2" "kind-worker"]
ERROR: failed to create cluster: failed to copy loadbalancer config to node: failed to create directory /usr/local/etc
/haproxy: command "docker exec --privileged kind-external-load-balancer mkdir -p /usr/local/etc/haproxy" failed with e
rror: exit status 1
Command Output: Error response from daemon: Container edad0ec6d6a7b5a0f9f8845469109c76bded2e345c4c81412fa88b1043c47c9b
is not running
cluster.yml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: control-plane
image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: control-plane
image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
docker logs -f kind-external-load-balancer
[WARNING] 094/172723 (1) : config : missing timeouts for frontend 'controlPlane'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 094/172723 (1) : haproxy version is 2.2.9-2+deb11u4
[NOTICE] 094/172723 (1) : path to executable is /usr/sbin/haproxy
[ALERT] 094/172723 (1) : Not enough memory to allocate 1073741816 entries for fdtab!
[ALERT] 094/172723 (1) : No polling mechanism available.
It is likely that haproxy was built with TARGET=generic and that FD_SETSIZE
is too low on this platform to support maxconn and the number of listeners
and servers. You should rebuild haproxy specifying your system using TARGET=
in order to support other polling systems (poll, epoll, kqueue) or reduce the
global maxconn setting to accommodate the system's limitation. For reference,
FD_SETSIZE=1024 on this system, global.maxconn=536870885 resulting in a maximum of
1073741816 file descriptors. You should thus reduce global.maxconn by 536870396. Also,
check build settings using 'haproxy -vv'.
[WARNING] 094/172828 (1) : config : missing timeouts for frontend 'controlPlane'.
| While not properly invalid, you will certainly encounter various problems
| with such a configuration. To fix this, please ensure that all following
| timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 094/172828 (1) : haproxy version is 2.2.9-2+deb11u4
[NOTICE] 094/172828 (1) : path to executable is /usr/sbin/haproxy
[ALERT] 094/172828 (1) : Not enough memory to allocate 1073741816 entries for fdtab!
[ALERT] 094/172828 (1) : No polling mechanism available.
It is likely that haproxy was built with TARGET=generic and that FD_SETSIZE
is too low on this platform to support maxconn and the number of listeners
and servers. You should rebuild haproxy specifying your system using TARGET=
in order to support other polling systems (poll, epoll, kqueue) or reduce the
global maxconn setting to accommodate the system's limitation. For reference,
FD_SETSIZE=1024 on this system, global.maxconn=536870885 resulting in a maximum of
1073741816 file descriptors. You should thus reduce global.maxconn by 536870396. Also,
check build settings using 'haproxy -vv'.
Both my coworker (running Fedora) and I (running Arch) are still experiencing this issue when we attempt to create a cluster, even though I'm using the latest release version which mentions the fix for this being included. Looks like the image version is hard coded into kind and hasn't been updated to reflect that change, perhaps? (Edit: manually retagged the newest image (v20230330-2f738c2) as the older image that's inside the code (v20230227-d46f45b6) and confirmed that it works now, the hardcoded image version definitely needs updated!)
Yes, this is https://github.com/kubernetes-sigs/kind/pull/3159
If you use https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-source with either ~~@latest
~~ @main
(go) or a source checkout from just now, it will contain this fix early.
Otherwise it will be rolled up in the next release, TBD.
If you don't mind: Can you share your use case for multiple-control-plane nodes? This is a relatively rarely used feature and usually not applicable in kind, it's something that needs more attention in the future and I'd like to make sure we drive improvements with concrete use-cases in mind.