TmpFS gets dirty with `exit` files
Issue Description
Since #21523 podman runs conmon with args --exit-dir and --persist-dir. After calling conmon it removes exit-dir/ctr-id and persist-dir/ctr-id/oom files. However, the persist-dir/ctr-id directory and the persist-dir/ctr-id/exit file remain undeleted. Over time, this leads to file system overflow.
Steps to reproduce the issue
Steps to reproduce the issue
- Calculate number of persist dirs
- Run, stop and remove any container
- Calculate number of persist dirs - it will be increased by 1
# ls /run/libpod/persist/ | wc -w && podman run --rm alpine; ls /run/libpod/persist/ | wc -w
9
10
Describe the results you received
Describe the results you received
Describe the results you expected
Podman should remove all artefacts including persist-dir to not pollute FS.
podman info output
host:
arch: amd64
buildahVersion: 1.36.0
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
- rdma
- misc
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.1.12-1.el9.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.1.12, commit: 7ba5bd6c81ff2c10e07aee8c4281d12a2878fa12'
cpuUtilization:
idlePercent: 99.06
systemPercent: 0.28
userPercent: 0.67
cpus: 2
databaseBackend: sqlite
distribution:
distribution: centos
version: "9"
eventLogger: journald
freeLocks: 2048
hostname: dmitry
idMappings:
gidmap: null
uidmap: null
kernel: 5.14.0-447.el9.x86_64
linkmode: dynamic
logDriver: journald
memFree: 2507890688
memTotal: 3736653824
networkBackend: netavark
networkBackendInfo:
backend: netavark
dns:
package: aardvark-dns-1.9.0-1.el9.x86_64
path: /usr/libexec/podman/aardvark-dns
version: aardvark-dns 1.9.0
package: netavark-1.11.0-1.el9.x86_64
path: /usr/libexec/podman/netavark
version: netavark 1.11.0
ociRuntime:
name: crun
package: crun-1.15-1.el9.x86_64
path: /usr/bin/crun
version: |-
crun version 1.15
commit: e6eacaf4034e84185fd8780ac9262bbf57082278
rundir: /run/user/0/crun
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
os: linux
pasta:
executable: /usr/bin/pasta
package: passt-0^20231204.gb86afe3-1.el9.x86_64
version: |
pasta 0^20231204.gb86afe3-1.el9.x86_64
Copyright Red Hat
GNU General Public License, version 2 or later
<https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
remoteSocket:
exists: false
path: /run/podman/podman.sock
rootlessNetworkCmd: pasta
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: true
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: slirp4netns-1.3.1-1.el9.x86_64
version: |-
slirp4netns version 1.3.1
commit: e5e368c4f5db6ae75c2fce786e31eef9da6bf236
libslirp: 4.4.0
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.2
swapFree: 0
swapTotal: 0
uptime: 8h 43m 8.00s (Approximately 0.33 days)
variant: ""
plugins:
authorization: null
log:
- k8s-file
- none
- passthrough
- journald
network:
- bridge
- macvlan
- ipvlan
volume:
- local
registries:
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 0
paused: 0
running: 0
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /var/lib/containers/storage
graphRootAllocated: 40165670912
graphRootUsed: 1532129280
graphStatus:
Backing Filesystem: extfs
Native Overlay Diff: "false"
Supports d_type: "true"
Supports shifting: "false"
Supports volatile: "true"
Using metacopy: "true"
imageCopyTmpDir: /var/tmp
imageStore:
number: 1
runRoot: /run/containers/storage
transientStore: false
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 5.1.0
Built: 1717411100
BuiltTime: Mon Jun 3 10:38:20 2024
GitCommit: ""
GoVersion: go1.22.3 (Red Hat 1.22.3-2.el9)
Os: linux
OsArch: linux/amd64
Version: 5.1.0
Podman in a container
No
Privileged Or Rootless
None
Upstream Latest Release
No
Additional environment details
No response
Additional information
No response
A friendly reminder that this issue had no activity for 30 days.
Is there any workaround for this ?
Is there any workaround for this ?
the only one w/a I know - periodically remove empty files using cron
Is there any workaround for this ?
the only one w/a I know - periodically remove empty files using cron
If there are hundreds of containers uping and downing constantly, then deleting persist dir introduces 10-20s lag for container shutdown process, also it keeps hanging in 'podman ps' afterwards. In other words this bug makes quite a mess on various levels.
Is there any workaround for this ?
the only one w/a I know - periodically remove empty files using cron
If there are hundreds of containers uping and downing constantly, then deleting persist dir introduces 10-20s lag for container shutdown process, also it keeps hanging in 'podman ps' afterwards. In other words this bug makes quite a mess on various levels.
absolutely agree with you
Is there any workaround for this ?
the only one w/a I know - periodically remove empty files using cron
Is it safe to just rm -rf all the directories in /run/libpod/persist? Or only select ones should be removed?
Several servers are out of inodes (800k by default in EL9) which lead to an outage.
It seems to be very difficult to find documentation about what this is exactly. conmon(8) for --persist-dir just talks about "storing container data"; what data is this exactly? Surely it is not the runtime layers of the containers and it must be something else?
Documentation could deserve some elaboration on what this means.
Added: a quick workaround is to increase the inode limit 20% with mount -o remount,rw,nosuid,nodev,seclabel,size=<your size limit>,nr_inodes=1048576,mode=755,inode64 -t tmpfs /run/ but this only adds some time until the 1m inodes is hit as well. But at least it will allow to start containers again which may help in getting the node to a state that permits rebooting, or just getting the system to a working state.
Fixed by #25297