kind
kind copied to clipboard
Flaky ingress behavior using ingress-nginx and rootless podman
What happened:
I'm trying to run the basic Ingress example with Ingress Nginx from here. I'm using rootless podman.
Once I create the example services and ingress, I try to do curl localhost:12345/foo/hostname. Note that there's a random port 12345, due to the fact that I pass 0 for the hostPort in the Kind cluster config (see below).
About 10% of the time it works and I get the desired response foo-app. The rest of the time, the curl command hangs indefinitely.
When I look in the nginx controller logs, I see a lot of messages like the following:
2023/12/13 13:51:36 [alert] 353#353: pthread_create() failed (11: Resource temporarily unavailable)
2023/12/13 13:51:36 [alert] 39#39: fork() failed while spawning "cache loader process" (11: Resource temporarily unavailable)
2023/12/13 13:51:36 [alert] 39#39: sendmsg() failed (9: Bad file descriptor)
2023/12/13 13:51:37 [alert] 39#39: worker process 52 exited with fatal code 2 and cannot be respawned
This looks to me like nginx is spawning a bunch of worker threads, and most of them are failing to create properly. Maybe there's some problem with rootless podman?
What you expected to happen:
HTTP requests to the ingress should work reliably.
How to reproduce it (as minimally and precisely as possible):
Just following the Ingress instructions. My exact Kind config is as follows:
Kind config file
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraMounts:
- hostPath: /nix/store/mrcy594mjgm5zcckr1f4i901isxiwj0s-binary-cache
containerPath: /binary-cache
readOnly: false
propagation: HostToContainer
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
authorization-mode: "AlwaysAllow"
streaming-connection-idle-timeout: "0"
extraPortMappings:
- containerPort: 80
hostPort: 0
Environment:
- kind version: (use
kind version):0.20.0 - Runtime info: (use
docker infoorpodman info):
podman info output
host:
arch: amd64
buildahVersion: 1.32.0
cgroupControllers:
- cpuset
- cpu
- io
- memory
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: Unknown
path: /nix/store/3bmd0vmvvvrashaxqb1d1apyy7smix3d-conmon-2.1.8/bin/conmon
version: 'conmon version 2.1.8, commit: '
cpuUtilization:
idlePercent: 95.22
systemPercent: 0.75
userPercent: 4.02
cpus: 32
databaseBackend: boltdb
distribution:
codename: stoat
distribution: nixos
version: "23.05"
eventLogger: journald
freeLocks: 2044
hostname: desktop2
idMappings:
gidmap:
- container_id: 0
host_id: 100
size: 1
- container_id: 1
host_id: 3000000
size: 2000000
uidmap:
- container_id: 0
host_id: 1001
size: 1
- container_id: 1
host_id: 3000000
size: 2000000
kernel: 6.1.60
linkmode: dynamic
logDriver: journald
memFree: 6684782592
memTotal: 67134550016
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: Unknown
path: /nix/store/igpk3cb4dmrr1mpvx5kb5prd1fk8kcss-podman-4.7.2/libexec/podman/aardvark-dns
version: aardvark-dns 1.9.0
package: Unknown
path: /nix/store/igpk3cb4dmrr1mpvx5kb5prd1fk8kcss-podman-4.7.2/libexec/podman/netavark
version: netavark 1.7.0
ociRuntime:
name: crun
package: Unknown
path: /nix/store/hllgilr2bhc6rbdrsbnrpaxyfqlzgqjg-crun-1.12/bin/crun
version: |-
crun version 1.12
commit: 1.12
rundir: /run/user/1001/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
pasta:
executable: ""
package: ""
version: ""
remoteSocket:
exists: true
path: /run/user/1001/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: true
seccompEnabled: true
seccompProfilePath: ""
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /nix/store/igpk3cb4dmrr1mpvx5kb5prd1fk8kcss-podman-4.7.2/libexec/podman/slirp4netns
package: Unknown
version: |-
slirp4netns version 1.2.2
commit: 0ee2d87523e906518d34a6b423271e4826f71faf
libslirp: 4.7.0
SLIRP_CONFIG_VERSION_MAX: 4
libseccomp: 2.5.4
swapFree: 3188670464
swapTotal: 9448923136
uptime: 10h 29m 50.00s (Approximately 0.42 days)
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- docker.io
- quay.io
store:
configFile: /home/tom/.config/containers/storage.conf
containerStore:
number: 1
paused: 0
running: 1
stopped: 0
graphDriverName: overlay
graphOptions: {}
graphRoot: /home/tom/.local/share/containers/storage
graphRootAllocated: 1958014603264
graphRootUsed: 1742862733312
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "true"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "false"
imageCopyTmpDir: /var/tmp
imageStore:
number: 1
runRoot: /run/user/1001/containers
transientStore: false
volumePath: /home/tom/.local/share/containers/storage/volumes
version:
APIVersion: 4.7.2
Built: 315532800
BuiltTime: Mon Dec 31 16:00:00 1979
GitCommit: ""
GoVersion: go1.21.4
Os: linux
OsArch: linux/amd64
Version: 4.7.2
- OS (e.g. from
/etc/os-release):NixOS 23.05 - Kubernetes version: (use
kubectl version):1.27.1 - Any proxies or other special environment settings?: No
Have you followed https://kind.sigs.k8s.io/docs/user/rootless/ ?
This may help? https://access.redhat.com/solutions/22105
When the system runs into a limitation in the number of processes, increase the nproc value in /etc/security/limits.conf or /etc/security/limits.d/90-nproc.conf depending on RHEL version.
Have you followed https://kind.sigs.k8s.io/docs/user/rootless/ ?
Yes, I've done everything there.
This may help? https://access.redhat.com/solutions/22105
Thanks, that does seem likely to be the problem. The nproc limit currently seems to be ~256k for the processes in my nginx pod. I don't think that's too low but I'll experiment with raising it.
~~This is actually kind of reminiscent of this old problem from the last time I tried to use Kind on NixOS, where NixOS had unusual resource limit settings that were getting inherited by the Kind cluster...~~
Some more progress:
I looked at the dmesg logs for the Nginx pod and saw the error cgroup: fork rejected by pids controller. This is pointing me towards the TasksMax setting of systemd.
If I podman exec into the cluster container, the TasksMax number seems quite low:
> systemctl show user-1000.slice | grep -i TasksMax
TasksMax=675
> systemctl show --property DefaultTasksMax
DefaultTasksMax=307
And on the Nginx pod specifically:
cat /sys/fs/cgroup/pids.max
307
I see that the containerd.service file in the Kind image sets TasksMax=infinity. Where could these low numbers be coming from?
One more thing: apparently Podman's default pids limit is 2048.
And in a few places, I've seen TasksMax=15% (could it be a systemd default?). And 2048 * 0.15 = 307.2. That seems like the potential source of the 307.
Hey, I got it to work by applying the following patch (to the v0.20.0 tag):
diff --git a/pkg/cluster/internal/providers/podman/provision.go b/pkg/cluster/internal/providers/podman/provision.go
index c240a292..4b276ba5 100644
--- a/pkg/cluster/internal/providers/podman/provision.go
+++ b/pkg/cluster/internal/providers/podman/provision.go
@@ -130,6 +130,7 @@ func commonArgs(cfg *config.Cluster, networkName string, nodeNames []string) ([]
// standard arguments all nodes containers need, computed once
args := []string{
"--detach", // run the container detached
+ "--pids-limit=65536", // higher pids limit
"--tty", // allocate a tty for entrypoint logs
"--net", networkName, // attach to its own network
// label the node with the cluster ID
With this, the nginx logs look much happier and the curl request to /foo/hostname works every time!
I think perhaps a pids.max of 2048 is too low for an entire Kubernetes cluster. How are people using Kind with rootless Podman given this limit? I guess everyone is using small clusters without too many pid-hungry applications. Maybe I should open a PR with this change?
FWIW, I tested what Docker does (the normal rootful one), and it seems to inherit the pids.max from the host system DefaultTasksMax, which for me is a more reasonable 76800.
Hi,
i was having the same issues with kind/ nginx ingress when provisioning using podman, can confirm it's indeed podman setting default PidsLimit:2048 when creating container;
For some strange reason this issue did not appear when trying to create cluster using podman for windows / podman machine, there podman created the container with PidsLimit:0;
Setting pids_limit=0 in /etc/containers/containers.conf had the same effect for me as applying the above mentioned patch.
my /etc/containers/containers.conf now looks like this:
[containers]
pids_limit = 0
@thomasjm: thanks a lot for investigating
@thomasjm interesting -- I don't think we should apply that patch because the number of pids we'd hardcode would be arbitrary and defeat the local user config that you can set as mentioned in the comment above. You can control this as a user by changing podman config which is probably the right approach for now.
We should add a warning about podman pidslimit config to the rootless docs though, and consider asking upstream if podman rootless might default to a higher limit.
If not, maybe we could set --pids-limit=0 for podman-rootless but I'm not sure that's the right move either, it would technically be a regression.
And thank you for debugging this! Very appreciated.
Thanks @thomasjm, your patch fixed my issue on nixos and tag v0.23.0.
If someone wants to add a note to the rootless docs our contributor guide covers everything including docs https://kind.sigs.k8s.io/docs/contributing/getting-started/