buildah icon indicating copy to clipboard operation
buildah copied to clipboard

synthetic runtime mounts are again being serialized into layers

Open cgwalters opened this issue 1 year ago • 2 comments

I was trying to write some guidance on reproducible/optimized container builds, and ran headfirst into the issue where podman/buildah inject generated internal tmpfs mount content into the tar stream, but docker doesn't is back (or maybe was never really fixed, I didn't double check at the time):

This container file is constructed such that it should result in a reproducible tar stream each time we do a build (i.e. two podman build --no-cache should result in the same diffid):

$ cat Containerfile
FROM busybox
RUN echo hello world > /test.txt && touch -r /usr /test.txt

(We could of course just run touch, but I like to demonstrate in this how one can use touch -r to canonicalize timestamps in a less trivial use case, such as after running curl or whatever)

$ rpm -q podman
podman-5.1.0-1.fc40.aarch64
$ podman build  -t localhost/test:v0 -f Containerfile --no-cache .
...
$ podman build  -t localhost/test:v1 -f Containerfile --no-cache .
...
$ podman image diff localhost/test:v0 localhost/test:v1
C /etc
C /run/systemd
C /run/systemd/resolve
C /run/systemd/resolve/stub-resolv.conf
$

(The /etc there is really /etc/hostname; not sure why the diff is apparently recursive in the /run case but not the /etc case)

It's not just the presence of this cruft that's problematic, it's that the build process serializes the current time into the tar stream for them, which defeats reproducible builds.

Now, running podman build --timestamp=<something> will paper over this; but that's a big/crude hammer, and while I've been recommending it in some places I am pretty sure it can easily introduce the same issues with e.g. Python that we've seen in ostree (ref https://github.com/ostreedev/ostree/issues/1469 ).


(Time passes)

Oh hey, I went to double check vs the latest docker (26.1.4), and it has a different variant of this bug where it apparently serializes just the top-level mount directories it injected at build time:

$ tar tvf test/blobs/sha256/1ed2a*
drwxr-xr-x 0/0               0 2024-06-16 07:13 etc/
drwxr-xr-x 0/0               0 2024-06-16 07:13 proc/
drwxr-xr-x 0/0               0 2024-06-16 07:13 sys/
-rw-r--r-- 0/0              12 2023-05-18 22:34 test.txt

I'm pretty sure this is a regression on their side, but not sure I care enough to dig up the version of docker I used in 2021 to double check.

cgwalters avatar Jun 16 '24 07:06 cgwalters

The /etc there is really /etc/hostname; not sure why the diff is apparently recursive in the /run case but not the /etc case)

Ahh, this logic ultimately comes from fbd1392a46558eb4adb368ba37fdce2b45013c1f which tried to paper over this underlying bug.

cgwalters avatar Jun 16 '24 08:06 cgwalters

@giuseppe @nalind PTAL

mheon avatar Jun 18 '24 12:06 mheon

A friendly reminder that this issue had no activity for 30 days.

github-actions[bot] avatar Jul 19 '24 00:07 github-actions[bot]

Ah sorry there are duplicate issues, closing in favor of https://github.com/containers/buildah/issues/4242

cgwalters avatar Sep 12 '24 18:09 cgwalters