kind limit loadbalancer max sockets

I'm currently dealing with this issue with haproxy: https://github.com/docker-library/haproxy/issues/194 It also impacts the kindest/haproxy docker images (which is where I came across it).

Setting a global maxconn in the haproxy.cfg file would fix the impact of the issue, though not the source issue which is somewhere between haproxy and docker AFAIK.

I was wondering if it would be acceptable to set that value in the haproxy config, and if so what value would be a good default? For fixing the above issue I think it can be arbitrarily high (as the issue seems to be the value being unset rather than what level it's set to).

Oct 04 '22 18:10 killianmuldoon

reading the linked issue it seems we have 2 options:

set maxconn
run the LB container setting the ulimits to a conservative number

I'm inclined towards 2, WDYT @BenTheElder ?

Oct 06 '22 09:10 aojea

run the LB container setting the ulimits to a conservative number

This is what the solution for Cluster API does - https://github.com/kubernetes-sigs/cluster-api/pull/7344 - it works, but getting the number right is difficult.

I'm not sure how many other places run kindest/haproxy independently, but those would need to be set the ulimits too whereas setting the maxconn in the config file would solve it for all consumers

Oct 06 '22 10:10 killianmuldoon

yeah, but we got bitten for the ulimit things multiple times , setting the ulimit is common in all "stable" distros like RHEL and SLES, however, more "edge" distros like Fedora, Arch, ... set it to NoLimit,

https://github.com/kubernetes-sigs/kind/pull/760#issuecomment-519286535

My main pro of this approach is that will also remove the dependency on haproxy in case we want to switch the loadbalancer, I don't know, 1048576 or 65536 sounds good enough for Kind users

I'm fine with one or the other though

Oct 06 '22 10:10 aojea

Sorry for the delayed response -- I think we should set a reasonable ulimit on the container, we have only the api server load balancing workload and don't support user based workloads other than their connections to the api server. We can pick a pretty reasonable upper bound for concurrent connections supported +1 for the config, and setting ulimit is more comprehensive.

Oct 06 '22 16:10 BenTheElder

We don't formally support reusing this image without importing kind or closely matching behavior, any future revisions can make any number of breaking changes freely.

At the moment the fact that we even use haproxy at all is an under designed internal detail that just happens to be working OK at the moment. We once used nginx, and HA probably deserves an overhaul at some point (pending more demand, kubernetes mostly doesn't have staffing to focus on testing HA at the moment).

CAPD is a bit of an odd duck here 😅

Oct 06 '22 16:10 BenTheElder

CAPD is a bit of an odd duck here 😅

Fair :D It does copy-past the folder into a third_party directory (and the version used there is many versions out of date)

Oct 06 '22 16:10 killianmuldoon

An odd duck we love :-)

Yeah, I think copying in is the safest approach, so CAPD can continue to update when ready.

Since both projects are kubernetes org owned and identical license, at least there's limited copy-paste license concerns 🙃

Oct 06 '22 17:10 BenTheElder

Setting a global maxconn in the haproxy.cfg file would fix the impact of the issue, though not the source issue which is somewhere between haproxy and docker AFAIK.

From what I understand, the root cause is that HAproxy allocates memory for each possible concurrent connection, up to the maximum defined by the maxconn field in its configuration.

Kind's HAproxy configuration does not set maxconn, so HAProxy derives it from the file descriptor limit, which the container inherits:

If this value is not set, it will automatically be calculated based on the current file descriptors limit reported by the "ulimit -n" command, possibly reduced to a lower value if a memory limit is enforced, based on the buffer size, memory allocated to compression, SSL cache size, and use or not of SSL and the associated maxsslconn (which can also be automatic). -- https://cbonte.github.io/haproxy-dconv/2.2/configuration.html#maxconn

Here's a demonstration using the latest image:

❯ docker run --ulimit nofile=1000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 453.

❯ docker run --ulimit nofile=10000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 4953.

❯ docker run --ulimit nofile=100000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 49953.

❯ docker run --ulimit nofile=1000000 --rm -it --memory=1gb --name haproxy-test kindest/haproxy:v20230227-d46f45b6 -d | grep maxconn
Note: setting global.maxconn to 499953.

Please see the gist for a demonstration of how the kindest/haproxy memory container memory usage scales with the file descriptor limit: https://gist.github.com/dlipovetsky/23443bef17371a56acd8cf0579e3f6b4

There are two to ways to control the memory usage: limit maxconn in the configuration, or limit per-process memory usage using the -m flag (the flag is explained in http://docs.haproxy.org/2.2/management.html#3).

Mar 03 '23 17:03 dlipovetsky

To follow up on my previous comment: It seems reasonable to limit either the maximum connections, or impose a memory limit, on HAProxy. Either of these changes would be explicit and clear. Changing the file descriptor limit is, at beast, an implicit, indirect way to change the number of maximum connections.

Mar 03 '23 17:03 dlipovetsky

Both my coworker (running Fedora) and I (running Arch) are still experiencing this issue when we attempt to create a cluster, even though I'm using the latest release version which mentions the fix for this being included. Looks like the image version is hard coded into kind and hasn't been updated to reflect that change, perhaps? (Edit: manually retagged the newest image (v20230330-2f738c2) as the older image that's inside the code (v20230227-d46f45b6) and confirmed that it works now, the hardcoded image version definitely needs updated!)

https://github.com/kubernetes-sigs/kind/blob/2f7221788e2cc51a6076490f2511d572cb6659d0/pkg/cluster/internal/loadbalancer/const.go#L20

> kind create cluster --config cluster.yaml
Creating cluster "kind" ...
 ✓ Ensuring node image (kindest/node:v1.26.3) 🖼
 ✓ Preparing nodes 📦 📦 📦 📦 📦 📦
 ✗ Configuring the external load balancer ⚖
Deleted nodes: ["kind-external-load-balancer" "kind-worker3" "kind-worker2" "kind-control-plane" "kind-control-plane3"
 "kind-control-plane2" "kind-worker"]
ERROR: failed to create cluster: failed to copy loadbalancer config to node: failed to create directory /usr/local/etc
/haproxy: command "docker exec --privileged kind-external-load-balancer mkdir -p /usr/local/etc/haproxy" failed with e
rror: exit status 1
Command Output: Error response from daemon: Container edad0ec6d6a7b5a0f9f8845469109c76bded2e345c4c81412fa88b1043c47c9b
 is not running

cluster.yml

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: control-plane
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: control-plane
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f
- role: worker
  image: kindest/node:v1.26.3@sha256:61b92f38dff6ccc29969e7aa154d34e38b89443af1a2c14e6cfbd2df6419c66f

docker logs -f kind-external-load-balancer
[WARNING] 094/172723 (1) : config : missing timeouts for frontend 'controlPlane'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 094/172723 (1) : haproxy version is 2.2.9-2+deb11u4
[NOTICE] 094/172723 (1) : path to executable is /usr/sbin/haproxy
[ALERT] 094/172723 (1) : Not enough memory to allocate 1073741816 entries for fdtab!
[ALERT] 094/172723 (1) : No polling mechanism available.
  It is likely that haproxy was built with TARGET=generic and that FD_SETSIZE
  is too low on this platform to support maxconn and the number of listeners
  and servers. You should rebuild haproxy specifying your system using TARGET=
  in order to support other polling systems (poll, epoll, kqueue) or reduce the
  global maxconn setting to accommodate the system's limitation. For reference,
  FD_SETSIZE=1024 on this system, global.maxconn=536870885 resulting in a maximum of
  1073741816 file descriptors. You should thus reduce global.maxconn by 536870396. Also,
  check build settings using 'haproxy -vv'.

[WARNING] 094/172828 (1) : config : missing timeouts for frontend 'controlPlane'.
   | While not properly invalid, you will certainly encounter various problems
   | with such a configuration. To fix this, please ensure that all following
   | timeouts are set to a non-zero value: 'client', 'connect', 'server'.
[NOTICE] 094/172828 (1) : haproxy version is 2.2.9-2+deb11u4
[NOTICE] 094/172828 (1) : path to executable is /usr/sbin/haproxy
[ALERT] 094/172828 (1) : Not enough memory to allocate 1073741816 entries for fdtab!
[ALERT] 094/172828 (1) : No polling mechanism available.
  It is likely that haproxy was built with TARGET=generic and that FD_SETSIZE
  is too low on this platform to support maxconn and the number of listeners
  and servers. You should rebuild haproxy specifying your system using TARGET=
  in order to support other polling systems (poll, epoll, kqueue) or reduce the
  global maxconn setting to accommodate the system's limitation. For reference,
  FD_SETSIZE=1024 on this system, global.maxconn=536870885 resulting in a maximum of
  1073741816 file descriptors. You should thus reduce global.maxconn by 536870396. Also,
  check build settings using 'haproxy -vv'.

Apr 05 '23 17:04 repnop

Both my coworker (running Fedora) and I (running Arch) are still experiencing this issue when we attempt to create a cluster, even though I'm using the latest release version which mentions the fix for this being included. Looks like the image version is hard coded into kind and hasn't been updated to reflect that change, perhaps? (Edit: manually retagged the newest image (v20230330-2f738c2) as the older image that's inside the code (v20230227-d46f45b6) and confirmed that it works now, the hardcoded image version definitely needs updated!)

Yes, this is https://github.com/kubernetes-sigs/kind/pull/3159

Apr 05 '23 17:04 BenTheElder

If you use https://kind.sigs.k8s.io/docs/user/quick-start/#installing-from-source with either ~~@latest~~ @main (go) or a source checkout from just now, it will contain this fix early.

Otherwise it will be rolled up in the next release, TBD.

If you don't mind: Can you share your use case for multiple-control-plane nodes? This is a relatively rarely used feature and usually not applicable in kind, it's something that needs more attention in the future and I'd like to make sure we drive improvements with concrete use-cases in mind.

Apr 05 '23 18:04 BenTheElder

kind kind copied to clipboard

limit loadbalancer max sockets

kind
kind copied to clipboard