kind Flaky ingress behavior using ingress-nginx and rootless podman

What happened:

I'm trying to run the basic Ingress example with Ingress Nginx from here. I'm using rootless podman.

Once I create the example services and ingress, I try to do curl localhost:12345/foo/hostname. Note that there's a random port 12345, due to the fact that I pass 0 for the hostPort in the Kind cluster config (see below).

About 10% of the time it works and I get the desired response foo-app. The rest of the time, the curl command hangs indefinitely.

When I look in the nginx controller logs, I see a lot of messages like the following:

2023/12/13 13:51:36 [alert] 353#353: pthread_create() failed (11: Resource temporarily unavailable)
2023/12/13 13:51:36 [alert] 39#39: fork() failed while spawning "cache loader process" (11: Resource temporarily unavailable)
2023/12/13 13:51:36 [alert] 39#39: sendmsg() failed (9: Bad file descriptor)
2023/12/13 13:51:37 [alert] 39#39: worker process 52 exited with fatal code 2 and cannot be respawned

This looks to me like nginx is spawning a bunch of worker threads, and most of them are failing to create properly. Maybe there's some problem with rootless podman?

What you expected to happen:

HTTP requests to the ingress should work reliably.

How to reproduce it (as minimally and precisely as possible):

Just following the Ingress instructions. My exact Kind config is as follows:

Kind config file

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane

  extraMounts:
  - hostPath: /nix/store/mrcy594mjgm5zcckr1f4i901isxiwj0s-binary-cache
    containerPath: /binary-cache
    readOnly: false
    propagation: HostToContainer

  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
        authorization-mode: "AlwaysAllow"
        streaming-connection-idle-timeout: "0"
  extraPortMappings:
  - containerPort: 80
    hostPort: 0

Environment:

kind version: (use kind version): 0.20.0
Runtime info: (use docker info or podman info):

podman info output

host:
  arch: amd64
  buildahVersion: 1.32.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: Unknown
    path: /nix/store/3bmd0vmvvvrashaxqb1d1apyy7smix3d-conmon-2.1.8/bin/conmon
    version: 'conmon version 2.1.8, commit: '
  cpuUtilization:
    idlePercent: 95.22
    systemPercent: 0.75
    userPercent: 4.02
  cpus: 32
  databaseBackend: boltdb
  distribution:
    codename: stoat
    distribution: nixos
    version: "23.05"
  eventLogger: journald
  freeLocks: 2044
  hostname: desktop2
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 100
      size: 1
    - container_id: 1
      host_id: 3000000
      size: 2000000
    uidmap:
    - container_id: 0
      host_id: 1001
      size: 1
    - container_id: 1
      host_id: 3000000
      size: 2000000
  kernel: 6.1.60
  linkmode: dynamic
  logDriver: journald
  memFree: 6684782592
  memTotal: 67134550016
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: Unknown
      path: /nix/store/igpk3cb4dmrr1mpvx5kb5prd1fk8kcss-podman-4.7.2/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: Unknown
    path: /nix/store/igpk3cb4dmrr1mpvx5kb5prd1fk8kcss-podman-4.7.2/libexec/podman/netavark
    version: netavark 1.7.0
  ociRuntime:
    name: crun
    package: Unknown
    path: /nix/store/hllgilr2bhc6rbdrsbnrpaxyfqlzgqjg-crun-1.12/bin/crun
    version: |-
      crun version 1.12
      commit: 1.12
      rundir: /run/user/1001/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: ""
    package: ""
    version: ""
  remoteSocket:
    exists: true
    path: /run/user/1001/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: ""
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /nix/store/igpk3cb4dmrr1mpvx5kb5prd1fk8kcss-podman-4.7.2/libexec/podman/slirp4netns
    package: Unknown
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.4
  swapFree: 3188670464
  swapTotal: 9448923136
  uptime: 10h 29m 50.00s (Approximately 0.42 days)
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - docker.io
  - quay.io
store:
  configFile: /home/tom/.config/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 1
    stopped: 0
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/tom/.local/share/containers/storage
  graphRootAllocated: 1958014603264
  graphRootUsed: 1742862733312
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/user/1001/containers
  transientStore: false
  volumePath: /home/tom/.local/share/containers/storage/volumes
version:
  APIVersion: 4.7.2
  Built: 315532800
  BuiltTime: Mon Dec 31 16:00:00 1979
  GitCommit: ""
  GoVersion: go1.21.4
  Os: linux
  OsArch: linux/amd64
  Version: 4.7.2

OS (e.g. from /etc/os-release): NixOS 23.05
Kubernetes version: (use kubectl version): 1.27.1
Any proxies or other special environment settings?: No

Dec 13 '23 14:12 thomasjm

Have you followed https://kind.sigs.k8s.io/docs/user/rootless/ ?

Dec 13 '23 17:12 BenTheElder

This may help? https://access.redhat.com/solutions/22105

When the system runs into a limitation in the number of processes, increase the nproc value in /etc/security/limits.conf or /etc/security/limits.d/90-nproc.conf depending on RHEL version.

Dec 14 '23 00:12 AkihiroSuda

Have you followed https://kind.sigs.k8s.io/docs/user/rootless/ ?

Yes, I've done everything there.

This may help? https://access.redhat.com/solutions/22105

Thanks, that does seem likely to be the problem. The nproc limit currently seems to be ~256k for the processes in my nginx pod. I don't think that's too low but I'll experiment with raising it.

~~This is actually kind of reminiscent of this old problem from the last time I tried to use Kind on NixOS, where NixOS had unusual resource limit settings that were getting inherited by the Kind cluster...~~

Dec 14 '23 01:12 thomasjm

Some more progress:

I looked at the dmesg logs for the Nginx pod and saw the error cgroup: fork rejected by pids controller. This is pointing me towards the TasksMax setting of systemd.

If I podman exec into the cluster container, the TasksMax number seems quite low:

> systemctl show user-1000.slice | grep -i TasksMax
TasksMax=675

> systemctl show --property DefaultTasksMax
DefaultTasksMax=307

And on the Nginx pod specifically:

cat /sys/fs/cgroup/pids.max 
307

I see that the containerd.service file in the Kind image sets TasksMax=infinity. Where could these low numbers be coming from?

Dec 14 '23 10:12 thomasjm

One more thing: apparently Podman's default pids limit is 2048.

And in a few places, I've seen TasksMax=15% (could it be a systemd default?). And 2048 * 0.15 = 307.2. That seems like the potential source of the 307.

Dec 14 '23 11:12 thomasjm

Hey, I got it to work by applying the following patch (to the v0.20.0 tag):

diff --git a/pkg/cluster/internal/providers/podman/provision.go b/pkg/cluster/internal/providers/podman/provision.go
index c240a292..4b276ba5 100644
--- a/pkg/cluster/internal/providers/podman/provision.go
+++ b/pkg/cluster/internal/providers/podman/provision.go
@@ -130,6 +130,7 @@ func commonArgs(cfg *config.Cluster, networkName string, nodeNames []string) ([]
        // standard arguments all nodes containers need, computed once
        args := []string{
                "--detach",           // run the container detached
+               "--pids-limit=65536", // higher pids limit
                "--tty",              // allocate a tty for entrypoint logs
                "--net", networkName, // attach to its own network
                // label the node with the cluster ID

With this, the nginx logs look much happier and the curl request to /foo/hostname works every time!

I think perhaps a pids.max of 2048 is too low for an entire Kubernetes cluster. How are people using Kind with rootless Podman given this limit? I guess everyone is using small clusters without too many pid-hungry applications. Maybe I should open a PR with this change?

FWIW, I tested what Docker does (the normal rootful one), and it seems to inherit the pids.max from the host system DefaultTasksMax, which for me is a more reasonable 76800.

Dec 14 '23 11:12 thomasjm

Hi,

i was having the same issues with kind/ nginx ingress when provisioning using podman, can confirm it's indeed podman setting default PidsLimit:2048 when creating container;

For some strange reason this issue did not appear when trying to create cluster using podman for windows / podman machine, there podman created the container with PidsLimit:0;

Setting pids_limit=0 in /etc/containers/containers.conf had the same effect for me as applying the above mentioned patch.

my /etc/containers/containers.conf now looks like this:

[containers]
pids_limit = 0

@thomasjm: thanks a lot for investigating

Dec 14 '23 13:12 foyb

@thomasjm interesting -- I don't think we should apply that patch because the number of pids we'd hardcode would be arbitrary and defeat the local user config that you can set as mentioned in the comment above. You can control this as a user by changing podman config which is probably the right approach for now.

We should add a warning about podman pidslimit config to the rootless docs though, and consider asking upstream if podman rootless might default to a higher limit.

If not, maybe we could set --pids-limit=0 for podman-rootless but I'm not sure that's the right move either, it would technically be a regression.

Dec 14 '23 19:12 BenTheElder

And thank you for debugging this! Very appreciated.

Dec 14 '23 19:12 BenTheElder

Thanks @thomasjm, your patch fixed my issue on nixos and tag v0.23.0.

Jun 01 '24 10:06 Madjinn

If someone wants to add a note to the rootless docs our contributor guide covers everything including docs https://kind.sigs.k8s.io/docs/contributing/getting-started/

Jun 03 '24 16:06 BenTheElder

kind kind copied to clipboard

Flaky ingress behavior using ingress-nginx and rootless podman

kind
kind copied to clipboard