coreos-assembler icon indicating copy to clipboard operation
coreos-assembler copied to clipboard

Builds are not reproducible

Open mtalexan opened this issue 2 years ago • 5 comments

Bug Report

This might be an ostree bug?

Builds using the coreos-assembler container with -e SOURCE_DATE_EPOCH=${fixed_epoch}, a fixed config commit, and run with build --version=${fixed_buildid} on a pristine VM system are completing but producing different ostree-content-checksum values (per meta.json) on every build repetition of cosa clean && cosa build --version=${fixed_buildid}.

Environment

What operating system is being used to run coreos-assembler?
Fedora 37 (in a fresh VM)

What operating system is being assembled?
Customized minimal FCOS variant (fixed config commit that builds successfully)

Is coreos-assembler running in Podman or Docker?
podman 4.3.1

If Podman, is coreos-assembler running privileged or unprivileged?
privileged

Expected Behavior

Running cosa clean then cosa build --version=${fixed_version} on a clean config repo with SOURCE_DATE_EPOCH set to a fixed value should produce the same ostree-content-checksum as the prior identical build when the rpm-ostree-inputhash is the same.

Actual Behavior

Running cosa clean then cosa build --version=${fixed_version} on a clean config repo with SOURCE_DATE_EPOCH set to a fixed value produces a unique ostree-content-checksum on every build even when the rpm-ostree-inputhash is the same on all prior builds.

Reproduction Steps

Using cosa alias:

cosa() {
   env | grep COREOS_ASSEMBLER
   set -x
   podman run --rm -it --security-opt label=disable --privileged \
              --uidmap=1000:0:1 --uidmap=0:1:1000 --uidmap 1001:1001:64536 \
              -v ${PWD}:/srv/ \
	      --device /dev/kvm \
	      --device /dev/fuse \
              --tmpfs /tmp -v /var/tmp:/var/tmp \
              -v /etc/ssl/certs:/etc/ssl/cert:ro \
	      -v /etc/pki/:/etc/pki/:ro \
	      -v /usr/share/pki/ca-trust-legacy/:/usr/share/pki/ca-trust-legacy/:ro \
	      -v ${HOME}/.ssh:/home/builder/.ssh \
	      -e SOURCE_DATE_EPOCH=1672531200 \
              private.registry/copy-of-quay-io-coreos-assembler-image/cosa:latest "$@"
   rc=$?; set +x; return $rc
}
  1. cosa init --branch=my-fixed-branch git@private-gitlab/my-private-coreos-config.git
  2. cosa fetch --strict
  3. cosa build --version=59ff54f
  4. cp builds/latest/x86_64/meta.json ~/
  5. cosa clean
  6. cosa fetch --strict # doesn't end up doing anything
  7. cosa build --version=59ff54f
  8. jq -f ~/meta.json '.ostree-content-checksum' > ~/build1.hash
  9. jq -f builds/latest/x86_64/meta.json '.ostree-content-checksum' > ~/build2.hash
  10. diff ~/build1.hash ~/build2.hash

Other Information

The CoreOS Config contains the following to make the RPM DB generation deterministic:

rpmdb: bdb
rpmdb-normalize: true
When comparing the entire meta.json files, the following are identical between the two builds
  • ref
  • ostree-n-metadata-total
  • ostree-n-metadata-written
  • ostree-n-content-total
  • ostree-n-content-written
  • ostree-n-cache-hits
  • ostree-content-bytes-written
  • ostree-version (also matches buildid)
  • ostree-timestamp
  • rpm-ostree-inputhash
  • buildid (also matches ostree-version)
  • coreos-assembler.image-genver
  • name
  • summary
  • coreos-assembler.image-config-checksum
  • "coreos-assembler.code-source": "container"
  • coreos-assembler.container-config-git
{
  "commit": "59ff54f0574445ef2912d7ecf1ccda71f0eb3efb",
  "origin": "git@private-gitlab/my-private-coreos-config.git",
  "branch": "my-fixed-branch",
  "dirty": "false"
}
  • "coreos-assembler.delayed-meta-merge": false
  • coreos-assembler.container-image-git
{
  "commit": "d5f1623aad6d133b2c7c00e784c04ab6828450c1",
  "origin": "https://github.com/coreos/coreos-assembler.git",
  "branch": "main",
  "dirty": "true"
}
  • "coreos-assembler.config-gitrev": "59ff54f0574445ef2912d7ecf1ccda71f0eb3efb"
  • "coreos-assembler.config-dirty": "false",
  • "coreos-assembler.basearch": "x86_64"

Trying lots of iterations, it appears this gets worse the larger the image size involved. Our configs that have a handful of the larger rpms commented out to reduce the resulting RPM total size from 6.8 GB to 500 MB have reproducible builds most of the time, though occasionally it will suddenly start producing different results, but the larger images never produce the same results.

mtalexan avatar Jan 27 '23 00:01 mtalexan

Bit-level reproducibility is currently not a goal of coreos-assembler. Given the work that went into rpm-ostree to make composes reproducible, it probably wouldn't be too hard to at least make cosa build ostree be fully reproducible. We already reuse the source config git timestamp for overlays today. I suspect there are other compose inputs that need tighter timestamp control.

But cosa is used to also build many other artifacts (disk images and container images). Trying to make those fully reproducible would be a very large endeavour.

jlebon avatar Jan 27 '23 16:01 jlebon

The issue I'm reporting here is that the ostree commits aren't reproducible either.
I wasn't aware of the image non-reproducibility, though that's good to know and somewhat understandable, but the ostree commits coreos-assembler is constructing also aren't reproducible from what I can tell. My understanding was that's one of the main intents, reproducible ostree commits.
Am I misunderstanding or missing a relevant setting maybe?

mtalexan avatar Jan 27 '23 16:01 mtalexan

My understanding was that's one of the main intents, reproducible ostree commits.

No, it hasn't really been a focus, but it'd definitely be nice to support. That would also allow reproducible builds of the container image. I know you're using bdb, but for general information, note that for sqlite (which is what FCOS and el9 use), actually achieving this is also blocked on https://github.com/rpm-software-management/rpm/issues/2219.

jlebon avatar Jan 27 '23 17:01 jlebon

I won't speak for the other maintainers but I'd accept patches to enable this at least for the ostree compose (assuming they're not invasive, which I don't think they would be).

jlebon avatar Jan 27 '23 17:01 jlebon

For posterity, I traced this to 2 issues.

The first is that some Python libraries are byte-compiled in the root file system that's part of ostree, but only if they're used as part of an rpm installation hook for one of the included RPMs, and Python byte compilation has never been reproducible. The only current fix for this is to manually wipe all python byte-compilation caches before the final ostree compose using a custom postprocess hook. E.g. add this to your treefile:

postprocess: 
  - |
    echo "Removing all module __pycache__ folders under /usr"
    find /usr -type d -name '__pycache__' -exec rm -r '{}' +

The second issue is that the boot image is included in the ostree commits generated, and the boot image is a binary blob. That binary blob is not reproducible (I haven't figured out the exact reasons why yet), but it's causing the resulting ostree commit that's created containing it to be different.

mtalexan avatar Mar 22 '23 13:03 mtalexan