podman TmpFS gets dirty with `exit` files

Issue Description

Since #21523 podman runs conmon with args --exit-dir and --persist-dir. After calling conmon it removes exit-dir/ctr-id and persist-dir/ctr-id/oom files. However, the persist-dir/ctr-id directory and the persist-dir/ctr-id/exit file remain undeleted. Over time, this leads to file system overflow.

Steps to reproduce the issue

Calculate number of persist dirs
Run, stop and remove any container
Calculate number of persist dirs - it will be increased by 1

# ls /run/libpod/persist/ | wc -w && podman run --rm alpine; ls /run/libpod/persist/ | wc -w
9
10

Describe the results you received

Describe the results you expected

Podman should remove all artefacts including persist-dir to not pollute FS.

podman info output

host:
  arch: amd64
  buildahVersion: 1.36.0
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - rdma
  - misc
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.12-1.el9.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.12, commit: 7ba5bd6c81ff2c10e07aee8c4281d12a2878fa12'
  cpuUtilization:
    idlePercent: 99.06
    systemPercent: 0.28
    userPercent: 0.67
  cpus: 2
  databaseBackend: sqlite
  distribution:
    distribution: centos
    version: "9"
  eventLogger: journald
  freeLocks: 2048
  hostname: dmitry
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.0-447.el9.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 2507890688
  memTotal: 3736653824
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.el9.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: netavark-1.11.0-1.el9.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.11.0
  ociRuntime:
    name: crun
    package: crun-1.15-1.el9.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.15
      commit: e6eacaf4034e84185fd8780ac9262bbf57082278
      rundir: /run/user/0/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231204.gb86afe3-1.el9.x86_64
    version: |
      pasta 0^20231204.gb86afe3-1.el9.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/podman/podman.sock
  rootlessNetworkCmd: pasta
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.3.1-1.el9.x86_64
    version: |-
      slirp4netns version 1.3.1
      commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
      libslirp: 4.4.0
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.2
  swapFree: 0
  swapTotal: 0
  uptime: 8h 43m 8.00s (Approximately 0.33 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.access.redhat.com
  - registry.redhat.io
  - docker.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 0
    paused: 0
    running: 0
    stopped: 0
  graphDriverName: overlay
  graphOptions:
    overlay.mountopt: nodev,metacopy=on
  graphRoot: /var/lib/containers/storage
  graphRootAllocated: 40165670912
  graphRootUsed: 1532129280
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "true"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  transientStore: false
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 5.1.0
  Built: 1717411100
  BuiltTime: Mon Jun  3 10:38:20 2024
  GitCommit: ""
  GoVersion: go1.22.3 (Red Hat 1.22.3-2.el9)
  Os: linux
  OsArch: linux/amd64
  Version: 5.1.0

Podman in a container

No

Privileged Or Rootless

None

Upstream Latest Release

No

Additional environment details

No response

Additional information

No response

Jun 26 '24 11:06 dmitry-a-l

A friendly reminder that this issue had no activity for 30 days.

Jul 27 '24 00:07 github-actions[bot]

Is there any workaround for this ?

Nov 19 '24 18:11 dvnscr

Is there any workaround for this ?

the only one w/a I know - periodically remove empty files using cron

Nov 20 '24 01:11 dmitry-a-l

Is there any workaround for this ?

the only one w/a I know - periodically remove empty files using cron

If there are hundreds of containers uping and downing constantly, then deleting persist dir introduces 10-20s lag for container shutdown process, also it keeps hanging in 'podman ps' afterwards. In other words this bug makes quite a mess on various levels.

Nov 21 '24 01:11 dvnscr

Is there any workaround for this ?

the only one w/a I know - periodically remove empty files using cron

If there are hundreds of containers uping and downing constantly, then deleting persist dir introduces 10-20s lag for container shutdown process, also it keeps hanging in 'podman ps' afterwards. In other words this bug makes quite a mess on various levels.

absolutely agree with you

Nov 21 '24 08:11 dmitry-a-l

Is there any workaround for this ?

the only one w/a I know - periodically remove empty files using cron

Is it safe to just rm -rf all the directories in /run/libpod/persist? Or only select ones should be removed?
Several servers are out of inodes (800k by default in EL9) which lead to an outage.

It seems to be very difficult to find documentation about what this is exactly. conmon(8) for --persist-dir just talks about "storing container data"; what data is this exactly? Surely it is not the runtime layers of the containers and it must be something else?
Documentation could deserve some elaboration on what this means.

Added: a quick workaround is to increase the inode limit 20% with mount -o remount,rw,nosuid,nodev,seclabel,size=<your size limit>,nr_inodes=1048576,mode=755,inode64 -t tmpfs /run/ but this only adds some time until the 1m inodes is hit as well. But at least it will allow to start containers again which may help in getting the node to a state that permits rebooting, or just getting the system to a working state.

Jan 27 '25 07:01 bluikko

Fixed by #25297

Feb 12 '25 17:02 mheon