OCI image reproducibility fail
Abstract
OCI chose tar format as a basis for images storage layer, while not specifying any constrains on the tar format itself AFAIK.
In #805 @vbatts says:
The big determinism is once a layer is built. That unpacking and repacking of the same content is deterministic. Compression can and does mess this up, but at least for the
*.taritself, this should hold true.
I found out that it not always holds true. Thus the content addressable scheme might be affected.
Steps to reproduce (with linux + GNU tar):
- pull hello-world image with
skopeo -
gunzipthetar.gzlayer holding the hello binary -
tar xthe hello binary inside -
tar cthe extracted hello binary - compare two tar archives
- observe 1 bit difference in the hello tar header file stat mode section (and crc in the header consequentially)
I refined the case down to differences in GNU tar implementation vs Golang one.
Given that most of the containerization software nowadays written in Go, someone might find this useful. As a side note, I don't have any intention of digging deeper into this and hope for more experienced OCI/Golang (-related) guys picking it up.
decided to share this find here.And after this small discussion at #reproducible-builds IRC (click for convo log),
[10:43] <DYefimov> Hello. Diving deep into containerization software, I found out that GO implementation of tar behaves differently from GNU tar. More specifically it writes first three file stat mode triplets into the archive for every regular file, while GNU tar clamps (zeroes) them. GNU tar is conformant to the POSIX spec. OCI (Open Container Initiative) relies on tar format for its storage layer, without specifying any details. Resulting in: container images built with GO differ by 1 bit (+crc in the header) for every file within the archives, meaning that hashes used by the content addressable scheme differ too. Is this a reproducibility issue at all? [11:14] <*> DYefimov: kind-of, usually using different build tools means all bets are off, even for different versions of build tools. that said, it seems worth it to fix this [11:21] <DYefimov> Thanks gotcha. Ok, the root cause is in the GO tar implementation, but probably the best place to file the issue would be an OCI? as they are a more involved party and might be interested in fixing this by themselves... but frankly - I don't see an easy way around - most of containerization software nowadays built with GO... the impact seems to be so huge [11:30] <*> not sure about where to file the issue though, maybe both????
Testcase and explanation
Please, take a look at this testcase (click for tar_issue_test.sh source)
#!/usr/bin/env sh
set -e
SKOPEO_IMG=quay.io/skopeo/stable:latest
uname -srvmpio
docker --version
docker run --rm \
--security-opt seccomp=unconfined \
$SKOPEO_IMG --version
tar --version | head -n 1
echo '================================'
IMAGE_NAME=hello-world
TMP_DIR=$(mktemp -dt tar_issue_test.XXXXXXXX)
mkdir "$TMP_DIR/$IMAGE_NAME"
echo "Created \"$TMP_DIR\""
trap "echo \"Removing \\\"$TMP_DIR\\\"\"; rm -rf \"$TMP_DIR\"" EXIT
docker run --rm \
--security-opt seccomp=unconfined \
--user $(id -u):$(id -g) \
-v "$TMP_DIR/$IMAGE_NAME":"/$IMAGE_NAME" \
$SKOPEO_IMG \
copy docker://$IMAGE_NAME oci:$IMAGE_NAME:latest
mkdir "$TMP_DIR/$IMAGE_NAME/testdir"
cd "$TMP_DIR/$IMAGE_NAME/testdir"
mv \
"$TMP_DIR/$IMAGE_NAME/blobs/sha256/2db29710123e3e53a794f2694094b9b4338aa9ee5c40b930cb8063a1be392c54" \
"./src.tar.gz"
gunzip -q ./src.tar.gz
echo '================================'
tar xvf ./src.tar # contains just the "hello" binary
SOURCE_DATE_EPOCH=$(date +%s)
tar \
--format=ustar \
-b 1 \
--sort=name \
--numeric-owner --owner=0 --group=0 \
--mtime="@${SOURCE_DATE_EPOCH}" --clamp-mtime \
-cf repacked.tar hello
chmod g-w repacked.tar
echo '================================'
set -x
ls -lt --time-style=full-iso
tar -tvf src.tar
tar -tvf repacked.tar
cmp -l src.tar repacked.tar || true
hexdump -C src.tar | head
hexdump -C repacked.tar | head
set +x
echo '================================'
and it's output in my environment (kernel a bit outdated for irrelevant reasons):
Linux 4.15.0-176-generic #185-Ubuntu SMP Tue Mar 29 17:40:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Docker version 20.10.7, build f0df350
skopeo version 1.8.0
tar (GNU tar) 1.29
================================
Created "/tmp/tar_issue_test.h3RcZgaW"
Getting image source signatures
Copying blob sha256:2db29710123e3e53a794f2694094b9b4338aa9ee5c40b930cb8063a1be392c54
Copying config sha256:811f3caa888b1ee5310e2135cfd3fe36b42e233fe0d76d9798ebd324621238b9
Writing manifest to image destination
Storing signatures
================================
hello
================================
+ ls -lt --time-style=full-iso
total 48
-rw-r--r-- 1 dyefimov dyefimov 14848 2022-07-12 16:49:56.793553576 +0300 repacked.tar
-rw-r--r-- 1 dyefimov dyefimov 14848 2022-07-12 16:49:55.357547252 +0300 src.tar
-rwxrwxr-x 1 dyefimov dyefimov 13256 2021-09-24 02:47:50.000000000 +0300 hello
+ tar -tvf src.tar
-rwxrwxr-x 0/0 13256 2021-09-24 02:47 hello
+ tar -tvf repacked.tar
-rwxrwxr-x 0/0 13256 2021-09-24 02:47 hello
+ cmp -l src.tar repacked.tar
102 61 60
154 64 63
+ true
+ hexdump -C src.tar
+ head
00000000 68 65 6c 6c 6f 00 00 00 00 00 00 00 00 00 00 00 |hello...........|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000060 00 00 00 00 30 31 30 30 37 37 35 00 30 30 30 30 |....0100775.0000|
00000070 30 30 30 00 30 30 30 30 30 30 30 00 30 30 30 30 |000.0000000.0000|
00000080 30 30 33 31 37 31 30 00 31 34 31 32 33 32 31 31 |0031710.14123211|
00000090 30 34 36 00 30 31 30 32 37 34 00 20 30 00 00 00 |046.010274. 0...|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 75 73 74 61 72 00 30 30 00 00 00 00 00 00 00 |.ustar.00.......|
+ hexdump -C repacked.tar
+ head
00000000 68 65 6c 6c 6f 00 00 00 00 00 00 00 00 00 00 00 |hello...........|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000060 00 00 00 00 30 30 30 30 37 37 35 00 30 30 30 30 |....0000775.0000|
00000070 30 30 30 00 30 30 30 30 30 30 30 00 30 30 30 30 |000.0000000.0000|
00000080 30 30 33 31 37 31 30 00 31 34 31 32 33 32 31 31 |0031710.14123211|
00000090 30 34 36 00 30 31 30 32 37 33 00 20 30 00 00 00 |046.010273. 0...|
000000a0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000100 00 75 73 74 61 72 00 30 30 00 00 00 00 00 00 00 |.ustar.00.......|
+ set +x
================================
Removing "/tmp/tar_issue_test.h3RcZgaW"
There are two differences between original and recompressed tar files (bytes 102 and 154) The second one is the different CRC and a direct consequence of the first one.
As you can see the original tar file has 1 bit extra at 0x65 offset.
Bytes 101-109 in the header correspond to the file stat mode of the entry.
So in the src.tar hello binary mode string (octal) is 0100775. while inside GNU compressed one it is 0000775.
That extra bit corresponds to the S_IFREG returned by stat() syscall for regular files.
"Possible" root cause and question
GNU tar truncates first three triplets of the modestring while Golang tar does not
GNU states:
Starting from version 1.14 GNU tar features full support for POSIX.1-2001 archives. A POSIX conformant archive will be created if tar was given ‘--format=posix’ (‘--format=pax’) option. No special option is required to read and extract from a POSIX archive.
POSIX tar.h knows nothing about S_IFREG.
Doesn't it mean Golang tar is not POSIX compliant? Am I missing something?
There are some inconsistencies in the above, like POSIX vs --format=ustar e.t.c. - of cause I double/triple checked all of them with the same result.
[UPDATE] After a bit more tracing...
Golang seems to be fine, at least for the S_IFREG part - it truncates it here right after the fstatat call, and here in the tar itself
Skopeo seems to be alright:
it pulls application/vnd.docker.image.rootfs.diff.tar.gzip from docker.io and silently puts it as application/vnd.oci.image.layer.v1.tar+gzip according to spec:
application/vnd.oci.image.layer.v1.tar+gzip
Interchangeable and fully compatible mime-types application/vnd.docker.image.rootfs.diff.tar.gzip
So in the end, somehow docker.io registry stores non-canonical tarball in its library/hello-world's rootfs blob. Where that extra S_IFREG bit came from is unknown. Nevertheless, it violates the statement by @vbatts That unpacking and repacking of the _same_ content is deterministic also affecting content addressable scheme and reproducibility.
In general, I see reproducibility as a best effort, but not a guarantee. For the guarantee, you'd need to match the tooling that produced the image, and that tooling would need to provide a reproducibility guarantee itself. There are a lot of variables, including things like gzip compression levels, various attributes in the tar headers, seekable tar formats (estargz), and various digest algorithms. The JSON schemas can be extended with custom fields, and some implementations aren't consistent with ordering of those fields or the white space used in the JSON.
Ideally we'll identify as many of these as possible, and specify a canonical standard for everyone to follow to maximize the possibility of reproducibility. However, consumers of image content will also be flexible in when they allow to maximize the portability of content and compatibility between tools.
Given this, are there any specific changes needed to the image-spec right now, or should this be closed and we can revisit individual spec issues on a case-by-case basis?
That unpacking and repacking of the same content is deterministic.
I think what he missed in this description is that it can be deterministic. If you use the exact same code with the same options and avoid introducing extra variables like accidentally leaking the "archive time" into the metadata of anything, they can be. If you use something like https://github.com/vbatts/tar-split, they also can be. However, the tar format itself does have enough variability that it's easy to accidentally encode things differently, or even just in a different order, and that's OK per the spec.
If you need the original blob of a bit of content, you either have to save that original blob somewhere (see the containerd content store design, for example), or you have to have a way to reproduce that original blob from what you do have (tar-split / Docker's old "graphdriver" approach).
Either way, I don't think there's more to add to the spec here, so I'm going to close.