mkosi icon indicating copy to clipboard operation
mkosi copied to clipboard

Support reproducible builds

Open Foxboron opened this issue 1 year ago • 10 comments

Looking into supporting mkosi for https://system-transparency.org/ but we need support for reproducible builds.

Currently mkosi support --finalize-script for image cleanups, but there are none distributed as part of mkosi project, but I'm wondering if we should contemplate having a set of cleanup scripts to help make images reproducible as part of the project? I think it could be done by having --reproducible as an alias for --finalize-script=reproducible-builds.$distro, or something along those lines.

mmdebstrap effectively has a little bit of code to just rm files which has timestamps and similar embedded. https://gitlab.mister-muffin.de/josch/mmdebstrap/src/branch/main/mmdebstrap#L2948

Relevant issues around Reproducible Builds; https://github.com/systemd/mkosi/issues/700 https://github.com/systemd/mkosi/issues/687

Foxboron avatar Aug 10 '22 10:08 Foxboron

Like the idea, but I guess an additional step in build_image, maybe just after run_finalize_script:

run_finalize_script
+if args.reproducible:
+    make_reproducible(args, root, do_run_build_script, for_cache)

for some implementation of make_reproducible would be more in line with the rest (and I think right now we only support a single finalize script).

behrmann avatar Aug 10 '22 10:08 behrmann

A little bit of work and I have gotten reproducible Debian images with mkosi. A couple of hacks and some stuff that should be polished, but hopefully it inspires a bit!

I'll send a few pull-requests :)

(.venv) λ mkosi-test » sudo mkosi -d debian --finalize-script=reproducible --no-manifest -o debian.cpio.xz -t cpio --compress-output=xz build
[.....]
‣   Running finalize script…
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/dpkg.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/bootstrap.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/cache/apt/pkgcache.bin
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/history.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/term.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/alternatives.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/cache/ldconfig/aux-cache
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/eipp.log.xz
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/lib/dbus/machine-id
‣  Unmounting image…
‣  Creating archive…
‣ Linking image file…
‣  Changing ownership of output file debian.cpio.xz to user fox (acquired from sudo)…
‣  Changed ownership of debian.cpio.xz
‣ Linked debian.cpio.xz
‣ Resulting image size is 47.5M, consumes 47.5M.
(.venv) λ mkosi-test » sha256sum debian*
fb6fce4c54780cf6c0f540bb7d5a732f4536122d4b2bd45b35ce622b5b64b975  debian2.cpio.xz
fb6fce4c54780cf6c0f540bb7d5a732f4536122d4b2bd45b35ce622b5b64b975  debian.cpio.xz
diff --git a/mkosi/__init__.py b/mkosi/__init__.py
index 3f79095..6fc795a 100644
--- a/mkosi/__init__.py
+++ b/mkosi/__init__.py
@@ -3559,6 +3559,10 @@ def make_cpio(

     root_dir = root / "usr" if args.usr_only else root

+
+    reset_timestamps = ["find", root_dir, "-mindepth", "1", "-execdir", "touch", "-hcd", "@0", "{}", "+"]
+    run(reset_timestamps)
+
     with complete_step("Creating archive…"):
         f: BinaryIO = cast(BinaryIO, tempfile.NamedTemporaryFile(dir=os.path.dirname(args.output), prefix=".mkosi-"))

@@ -3573,7 +3577,7 @@ def make_cpio(
             assert cpio.stdin is not None

             with spawn(compressor, stdin=cpio.stdout, stdout=f, delay_interrupt=False):
-                for file in files:
+                for file in sorted(files):
                     cpio.stdin.write(os.fspath(file).encode("utf8") + b"\0")
                 cpio.stdin.close()
         if cpio.wait() != 0:
@@ -5119,6 +5123,7 @@ def create_parser() -> ArgumentParserMkosi:
         type=cast(Callable[[str], ManifestFormat], ManifestFormat.parse_list),
         help="Manifest Format",
     )
+    group.add_argument('--manifest', default=True, action=argparse.BooleanOptionalAction),
     group.add_argument(
         "-o", "--output",
         help="Output image path",
@@ -7445,7 +7450,9 @@ def build_stuff(args: MkosiArgs) -> Manifest:
     workspace = setup_workspace(args)

     image = BuildOutput.empty()
-    manifest = Manifest(args)
+    manifest = None
+    if args.manifest:
+        manifest = Manifest(args)

     # Make sure tmpfiles' aging doesn't interfere with our workspace
     # while we are working on it.
@@ -8137,7 +8144,8 @@ def run_verb(raw: argparse.Namespace) -> None:
         if args.auto_bump:
             bump_image_version(args)

-        save_manifest(args, manifest)
+        if args.manifest:
+            save_manifest(args, manifest)

         print_output_size(args)

diff --git a/mkosi/backend.py b/mkosi/backend.py
index 07f285e..d550ab3 100644
--- a/mkosi/backend.py
+++ b/mkosi/backend.py
@@ -436,6 +436,7 @@ class MkosiArgs:
     architecture: str
     output_format: OutputFormat
     manifest_format: List[ManifestFormat]
+    manifest: bool
     output: Path
     output_dir: Optional[Path]
     bootable: bool

Foxboron avatar Aug 10 '22 12:08 Foxboron

What should the UX for reproducing mkosi images be?

The current workflow I've implemented for the Arch images is this;

mkosi -d arch --reproducible -o arch.cpio.xz -t cpio --compress-output=xz build
mkosi -d arch --reproducible --manifest-file ./arch.cpio.xz.manifest -o arch.repro.cpio.xz -t cpio --compress-output=xz build

But would it make more sense to have a reproduce subcommand and include more information in the manifests?

mkosi -d arch --reproducible -o arch.cpio.xz -t cpio --compress-output=xz build
mkosi reproduce ./arch.cpio.xz.manifest

Is there anything else we need to take care of or think of?

Foxboron avatar Aug 15 '22 12:08 Foxboron

The second approach definitely makes more sense to me.

Unfortunately, we're probably going to run into the limits of our argument parsing again, since all arguments apply to all commands, which probably wouldn't make sense for a "reproduce" command.

The main thing that would need to be added to the manifest file is a serialized version of the config used to build the image.

DaanDeMeyer avatar Aug 15 '22 12:08 DaanDeMeyer

Hmm, should we try to serialize the config into the manifest, or do we assume that the manifest + configuration is what is capable of reproducing the image?

Foxboron avatar Aug 16 '22 07:08 Foxboron

I think serialising the config would be a good approach, although I'm a bit apprehensive to just dump it to json, because MkosiArgs will most certainly still change down the line and then we might run into issues with missing or unexpected keys, when trying to recreate stuff. Some version field might be sensible here.

@keszybz Do you have input here? Since you started the manifest work you probably have thoughts where such extensions should go to.

behrmann avatar Aug 16 '22 08:08 behrmann

I think we should do the rework discussed in #769, in a way that there's a few layers of clearly-separated config:

  • config files + command-line args
  • effective config with automatic extensions to the package lists, e.g. when we add some packages based on the selected distro, partition sizes that were selected, etc.
  • effective package list resulting from the above config (i.e. what the manifest gathers currently)

And I'd (optionally) save all three in the manifest file. If we get the abstractions right, this shouldn't be any extra work, just serialization to json of a few dicts or dataclass objects. And this would give all the information to understand what was done and how to repeat it. How this information is to be used would be chosen by the "client" that is doing the repeat build, depending on the intended scenario.

keszybz avatar Aug 16 '22 12:08 keszybz

Should we do the rework first, or would people be fine with me jamming a few new variables into manifest.json to get some of the basic reprobuilds goals accomplished?

We can just mark any form of reproducible builds as experimental to avoid any form of commitment on the manifest format.

Foxboron avatar Aug 16 '22 12:08 Foxboron

I'd be in favour of that.

behrmann avatar Aug 16 '22 13:08 behrmann

@rphibel is doing some fundamental work to make serializing the config into the manifest easier, starting by splitting MkosiArgs into MkosiConfig and MkosiState

DaanDeMeyer avatar Aug 19 '22 09:08 DaanDeMeyer