storage
storage copied to clipboard
`Error: readlink /var/lib/containers/storage/overlay: invalid argument` after commit + hard reset (unflushed data)
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
I'm aware running podman commit then pulling the power is not supported, but I'm hoping the most obvious errors can be squashed if traced down precisely enough.
If these issues are not enough I can afford some time to make it into PRs, but someone familiar with the code will probably fix it much faster than myself so I'm thinking my time is better spent finding more issues. Please tell me if I'm wrong.
Steps to reproduce the issue:
-
run some container e.g.
podman run -ti --name test --replace localhost/testand change something inside -
commit and pull the plug:
podman commit test test,echo c > /proc/sysrq-trigger -
try to list/run/inspect new image after reboot
Describe the results you received:
armadillo:~# podman image inspect test
Error: readlink /var/lib/containers/storage/overlay: invalid argument
Running with strace, it's because /var/lib/containers/storage/overlay/22796afe2874d426a1c46770f63c639030d993a6df09fb70909e675922f7e8c4/lower has been created with no data (empty file), so /var/lib/containers/storage/overlay/<content of lower file> is not found which normally leads to stat+readlink, except that readlink on a directory does not work when flie is empty.
(By the way, writting something that doesn't exist e.g. "foo" in the file leads podman to as it writes the invalid path to link file then loops reading link file and trying to check overlay/l/<link content> forever -- this probably warrants failing if overlay/l/ does not exist)
Describe the results you expected:
error should state image is invalid and suggest deleting it (I don't think it should delete anything automatically, but giving the user a clear command on how to move forward from there is probably good)
Output of podman version:
Version: 3.4.2
API Version: 3.4.2
Go Version: go1.17.3
Git Commit: d800e4f57e9388a4cd4e5d9321b4594f552bc5d7
Built: Sun Nov 14 04:26:13 2021
OS/Arch: linux/arm64
(same behaviour on 3.2.3)
Output of podman info --debug:
host:
arch: arm64
buildahVersion: 1.23.1
cgroupControllers:
- cpuset
- cpu
- cpuacct
- blkio
- memory
- devices
- freezer
- net_cls
- perf_event
- net_prio
- pids
- rdma
cgroupManager: cgroupfs
cgroupVersion: v1
conmon:
package: Unknown
path: /usr/bin/conmon
version: 'conmon version 2.0.30, commit: 4ab06de98f48513b22602d23d59d63b29e75824b'
cpus: 4
distribution:
distribution: alpine
version: 3.15.0_rc3
eventLogger: file
hostname: armadillo
idMappings:
gidmap: null
uidmap: null
kernel: 5.10.52-00112-gec168b956610-dirty
linkmode: dynamic
logDriver: k8s-file
memFree: 1250312192
memTotal: 1741979648
ociRuntime:
name: crun
package: Unknown
path: /usr/bin/crun
version: |-
crun version 1.3
commit: 4f6c8e0583c679bfee6a899c05ac6b916022561b
spec: 1.0.0
+SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
remoteSocket:
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /etc/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: /usr/bin/slirp4netns
package: Unknown
version: |-
slirp4netns version 1.1.12
commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
libslirp: 4.6.1
SLIRP_CONFIG_VERSION_MAX: 3
libseccomp: 2.5.2
swapFree: 0
swapTotal: 0
uptime: 19m 37.46s
plugins:
log:
- k8s-file
- none
network:
- bridge
- macvlan
volume:
- local
registries: {}
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 1
paused: 0
running: 0
stopped: 1
graphDriverName: overlay
graphOptions: {}
graphRoot: /var/lib/containers/storage
graphStatus:
Backing Filesystem: btrfs
Native Overlay Diff: "true"
Supports d_type: "true"
Using metacopy: "false"
imageStore:
number: 3
runRoot: /run/containers/storage
volumePath: /var/lib/containers/storage/volumes
version:
APIVersion: 3.4.2
Built: 1636831573
BuiltTime: Sun Nov 14 04:26:13 2021
GitCommit: d800e4f57e9388a4cd4e5d9321b4594f552bc5d7
GoVersion: go1.17.3
OsArch: linux/arm64
Version: 3.4.2
Package info (e.g. output of rpm -q podman or apt list podman):
# apk info podman
podman-3.4.2-r0 description:
Simple management tool for pods, containers and images
podman-3.4.2-r0 webpage:
https://podman.io/
podman-3.4.2-r0 installed size:
43 MiB
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
No, yes
Additional environment details (AWS, VirtualBox, physical, etc.):
barebone embedded system
By the way, I think it'd make a lot of sense to issue a fssync() at the end of podman commit (or more precisely the part of podman commit that stores to disk and is shared with pull etc -- after messages Storing signatures and printing image id), that'd probably avoid most of the bug reports that came back to me because of similar issues -- what do you think?
@nalind PTAL - this one seems pretty bad, given that c/storage seems unusable?
I don't think this is Podman itself given that we don't make any changes to c/storage paths directly.
After #1351 and #1407 (and some earlier changes), a layer being committed should be, durably (with fdatasync) recorded as incomplete, and thus, a sudden reboot while creating the layer should cause the layer to be automatically removed on next Podman start.
Of course, if the commit succeeds, and durably records to the metadata that it succeeded, but the layer data like the lower file (which do not use fdatasync) is not written to disk by that time, this does not help: the layer will be recorded as fully-formed but the contents are corrupt.
Thank you! I don't think we can do much about the content itself short of issuing a system-wide sync, and that has its own problems so I wouldn't suggest that. I'm fine as long as podman commands keep working.
I'll try to build a podman version with these and confirm, but we can probably close this and assume it's correct -- likely won't have time to test in the coming weeks.