storage
storage copied to clipboard
`podman images` slow with `--uidmap`
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
We run a single ~6 GB podman image with multiple containers, each rootfull and with a unique --uidmap & --gidmap. When running many (~20-30) such containers podman images command becomes very slow. After a quick look with strace and some other debugging, I believe most of the time spent is to calculate the image size. Unfortunately there is no way to skip that, even with --quiet.
Steps to reproduce the issue:
- Reset podman system:
$ sudo podman system reset
WARNING! This will remove:
- all containers
- all pods
- all images
- all build cache
Are you sure you want to continue? [y/N] y
A storage.conf file exists at /etc/containers/storage.conf
You should remove this file if you did not modified the configuration.
- Pull a large image and , run
podman images -a:
$ sudo podman pull example.com/repo/image
...
$ time sudo podman images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
example.com/repo/image latest 55e255254b1f 35 hours ago 6.33 GB
real 0m0.160s
user 0m0.040s
sys 0m0.048s
- Start 30 containers, each with unique
--uidmap
$ for i in {1..30}; do sudo podman run -d --uidmap 0:$i:100000 example.com/repo/image; done
...
$ time sudo podman images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
example.com/repo/image latest 55e255254b1f 35 hours ago 23.1 GB
real 1m0.881s
user 0m37.914s
sys 0m43.773s
- Remove all containers (also very slow...):
$ time sudo podman rm -f $(sudo podman ps -aq) > /dev/null
real 5m9.833s
user 0m1.020s
sys 0m1.159s
$ time sudo podman images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
example.com/repo/image latest 55e255254b1f 35 hours ago 23.1 GB
real 0m57.164s
user 0m36.532s
sys 0m41.995s
- Remove the image (Even though there are no containers or images at this point, podman still uses lots of space, see table below. But
podman images -ais fast):
$ time sudo podman rmi $(sudo podman images -aq)
Untagged: example.com/repo/image:latest
Deleted: 55e255254b1f146a3857e9e57c7c9f1d8fc5c8be8e26e32f475081885d8fa23f
real 1m20.925s
user 0m50.986s
sys 0m59.144s
$ time sudo podman images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
real 0m0.162s
user 0m0.033s
sys 0m0.042s
- Redownload last image (note that all except last layer already exist even though I've just deleted the last image in the previous step):
$ sudo podman pull example.com/repo/image:latest
Trying to pull example.com/repo/image:latest...
Getting image source signatures
Copying blob 273b8b71b7b6 skipped: already exists
Copying blob 5f4a79f41734 skipped: already exists
Copying blob 1d7e57823380 skipped: already exists
Copying blob 094dd4168f45 skipped: already exists
Copying blob 1da4ce7e5083 skipped: already exists
Copying blob 66934d8f93e1 skipped: already exists
Copying blob 933e13ee990c skipped: already exists
Copying blob 7ef04668fb37 skipped: already exists
Copying blob 6f9c51b8f8b2 skipped: already exists
Copying blob d2f3b5997ad1 skipped: already exists
Copying blob e0f5f3dbfa53 skipped: already exists
Copying blob 1f5c8166c3ba skipped: already exists
Copying blob 6862b881cb80 skipped: already exists
Copying blob 33c7276c4f03 skipped: already exists
Copying blob c53545616dfe skipped: already exists
Copying blob 7d8d70253d88 skipped: already exists
Copying blob 670c55e249d5 skipped: already exists
Copying blob b6436833b837 done
Copying config 55e255254b done
Writing manifest to image destination
Storing signatures
55e255254b1f146a3857e9e57c7c9f1d8fc5c8be8e26e32f475081885d8fa23f
- Delete the image again:
$ time sudo podman rmi $(sudo podman images -aq)
Untagged: example.com/repo/image:latest
Deleted: 55e255254b1f146a3857e9e57c7c9f1d8fc5c8be8e26e32f475081885d8fa23f
real 0m1.825s
user 0m0.730s
sys 0m1.219s
I've also collected a few stats at the end of each of steps above:
- # nodes:
sudo find /var/lib/containers/storage/ | wc -l dusize:sudo du -skh /var/lib/containers/storage/- Δ
dfsize: Difference in used size fromdfon the filesystem that has/var/lib/containers/storage/
| Step | #FS nodes | du size |
Δ df size |
|---|---|---|---|
| 1 | - | - | - |
| 2 | 55000 | 6.0G | +6.2G |
| 3 | 3221487 | 179G | +0.9G |
| 4 | 1714063 | 17G | -29M |
| 5 | 51137 | 5.5G | -1.6G |
| 6 | 55001 | 6.0G | +0.7G |
| 7 | 21 | 660K | -6.2G |
Describe the results you expected: There needs to be a way to list images without size information or at least optimize it, my understanding is that the image is written only once, the other "copies" are just mounts over the original storage with ShiftFS. So it should be enough to stat the real image files once and skip the ShiftFS mounts?
Another issue I discovered while collecting the data above; Why are nearly all layers of the original image still there after I remove the image in step 5?
Output of podman info --debug:
host:
arch: amd64
buildahVersion: 1.21.3
cgroupControllers:
- cpuset
- cpu
- io
- memory
- hugetlb
- pids
cgroupManager: systemd
cgroupVersion: v2
conmon:
package: conmon-2.0.29-1.module+el8.4.0+11822+6cc1e7d7.x86_64
path: /usr/bin/conmon
version: 'conmon version 2.0.29, commit: ae467a0c8001179d4d0adf4ada381108a893d7ec'
cpus: 24
distribution:
distribution: '"rhel"'
version: "8.4"
eventLogger: file
hostname: xxx
idMappings:
gidmap: null
uidmap: null
kernel: 4.18.0-305.19.1.el8_4.x86_64
linkmode: dynamic
memFree: 3411726336
memTotal: 50383847424
ociRuntime:
name: crun
path: /usr/bin/crun
version: |-
crun version 1.0
commit: 139dc6971e2f1d931af520188763e984d6cdfbf8
spec: 1.0.0
+SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
os: linux
remoteSocket:
path: /run/podman/podman.sock
security:
apparmorEnabled: false
capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
rootless: false
seccompEnabled: true
seccompProfilePath: /usr/share/containers/seccomp.json
selinuxEnabled: false
serviceIsRemote: false
slirp4netns:
executable: ""
package: ""
version: ""
swapFree: 0
swapTotal: 0
uptime: 71h 4m 13.08s (Approximately 2.96 days)
registries:
search:
- registry.access.redhat.com
- registry.redhat.io
- docker.io
store:
configFile: /etc/containers/storage.conf
containerStore:
number: 1
paused: 0
running: 1
stopped: 0
graphDriverName: overlay
graphOptions:
overlay.mountopt: nodev,metacopy=on
graphRoot: /data/containers/graph
graphStatus:
Backing Filesystem: xfs
Native Overlay Diff: "false"
Supports d_type: "true"
Using metacopy: "true"
imageStore:
number: 1
runRoot: /data/containers/run
volumePath: /data/containers/graph/volumes
version:
APIVersion: 3.2.3
Built: 1627370979
BuiltTime: Tue Jul 27 07:29:39 2021
GitCommit: ""
GoVersion: go1.15.7
OsArch: linux/amd64
Version: 3.2.3
Package info (e.g. output of rpm -q podman or apt list podman):
podman-3.2.3-0.10.module+el8.4.0+11989+6676f7ad.x86_64
Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)
Yes (with the latest available on RHEL 8.4 (podman 3.2.3))
@vrothberg PTAL, I know you've been doing some work in this area.
@freva Any chance you can try with a more-recent Podman (3.4.x ideally)? We've definitely made some improvements already in this area.
Thanks for the ping, @mheon!
I recently did some major improvement to speed-up image listing. But those improvement only apply when there is more than one image. In this case, we have exactly one image and the performance degrades with the number of containers using that image.
$ time sudo podman images -a
REPOSITORY TAG IMAGE ID CREATED SIZE
example.com/repo/image latest 55e255254b1f 35 hours ago 23.1 GB
That is very suspicious as the the image has been listed initially to be 6.33 GB. It seems like something's going on in storage. The more containers use the image, the more expensive it gets to calculate the total storage consumption. But I think that container shouldn't play any role here at all.
Not sure if I will find time today but I will next week.
@giuseppe PTAL
The reproducer works reliably on my machine as well. Looks c/storage related to me.
this problem will go away once we move to idmapped mounts
this happens because everytime you use a different mapping, c/storage needs to clone the image and chown it, effectively creating a new image.
So even if there is one image visible, in reality there are multiple images in the storage and we calculate the size for them.
AFAICS, the cost grows linearly with how many images are in the storage (even if they are not visible with podman images).
I'd say it is better to just wait for the problem to be solved once idmapped mounts work well with overlay that adding more heuristics to c/storage.
That makes sense, thank you, @giuseppe!
Do you have a rough ETA on when idmapped mounts will arrive?
We have been told that they hope to have the kernel fixed by end of year. IDMapps do not work with Overlay at this point in the kernel
Shall we make showing the size optional? It's expensive in any case but Docker displays it by default.
Yes I think we should make it optional.
And then have a containers.conf flag, if people want to match Docker behaviour.
Can the size be calculated once during image pull/import/creation/clone and stored instead of being calculated during each query? I am not aware of a reason an image would change without its Id also changing.
That would make sense, @giuseppe WDYT?
that seems to make sense to me, I am wondering why we currently don't do that. It might be because with images copied with metacopy the size is different since files are empty
Doesn't an image that is copied with metacopy become a container and therefore is not actually listable with podman image ls?
Now that I think of it, the command to get a container size requires the --size flag (podman ps --format=json --size).
- This makes sense to me, because it is an expensive operation. There is also not a good way to calculate the size of a container in advance as it is mutable.
- An image on the other hand should not be mutable so should be pre-computable.
- If an image was mutable the
Idwould not have much meaning, at least to me.
- If an image was mutable the
- If an image shares layers with other images that would be more of a
size on disknumber or an optimization rather than an image size.- This may be covered by a
podman system dfcommand or possibly bySharedSizefrompodman images --format=json -a. My systems just show0forSharedSizefor all my images. Not sure what else would cover this area. metacopycould fall under being an optimization (size on disk) with this logic.
- This may be covered by a
We have been told that they hope to have the kernel fixed by end of year. IDMapps do not work with Overlay at this point in the kernel
A few years on, do idmapps work with the overlay driver on Linux 6.x? If not, would it be better to use the fuse-overlayfs to get the perf back?
idmap works for rootful not for rootless.
idmap works for rootful not for rootless.
@rhatdan Is this still the case for overlay in modern kernel versions? what's the reason behind that?
I actually tried mounting overlay inside podman unshare and it worked perfectly. But somehow rootless podman pulls are unbearably slow (usually 2x time compared to rootful on same machine) especially for bigger images.
Here is the mount info inside the userns via podman unshare
overlay /tmp/a01/merged overlay rw,relatime,lowerdir=/tmp/a01/lower,upperdir=/tmp/a01/upper,workdir=/tmp/a01/work,redirect_dir=nofollow,index=off,metacopy=off 0 0
My kernel is v6.5.5 and podman is 4.7.1
Yes, We are working on podman pull being able to be done within the user namespace. But for now, the kernel does not allow idmaping in rootless user namespace.