mkosi
mkosi copied to clipboard
Support reproducible builds
Looking into supporting mkosi
for https://system-transparency.org/ but we need support for reproducible builds.
Currently mkosi
support --finalize-script
for image cleanups, but there are none distributed as part of mkosi
project, but I'm wondering if we should contemplate having a set of cleanup scripts to help make images reproducible as part of the project? I think it could be done by having --reproducible
as an alias for --finalize-script=reproducible-builds.$distro
, or something along those lines.
mmdebstrap
effectively has a little bit of code to just rm files which has timestamps and similar embedded.
https://gitlab.mister-muffin.de/josch/mmdebstrap/src/branch/main/mmdebstrap#L2948
Relevant issues around Reproducible Builds; https://github.com/systemd/mkosi/issues/700 https://github.com/systemd/mkosi/issues/687
Like the idea, but I guess an additional step in build_image
, maybe just after run_finalize_script
:
run_finalize_script
+if args.reproducible:
+ make_reproducible(args, root, do_run_build_script, for_cache)
for some implementation of make_reproducible
would be more in line with the rest (and I think right now we only support a single finalize script).
A little bit of work and I have gotten reproducible Debian images with mkosi
. A couple of hacks and some stuff that should be polished, but hopefully it inspires a bit!
I'll send a few pull-requests :)
(.venv) λ mkosi-test » sudo mkosi -d debian --finalize-script=reproducible --no-manifest -o debian.cpio.xz -t cpio --compress-output=xz build
[.....]
‣ Running finalize script…
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/dpkg.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/bootstrap.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/cache/apt/pkgcache.bin
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/history.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/term.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/alternatives.log
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/cache/ldconfig/aux-cache
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/log/apt/eipp.log.xz
+ rm -f /var/tmp/mkosi-lgzfxrx8/root/var/lib/dbus/machine-id
‣ Unmounting image…
‣ Creating archive…
‣ Linking image file…
‣ Changing ownership of output file debian.cpio.xz to user fox (acquired from sudo)…
‣ Changed ownership of debian.cpio.xz
‣ Linked debian.cpio.xz
‣ Resulting image size is 47.5M, consumes 47.5M.
(.venv) λ mkosi-test » sha256sum debian*
fb6fce4c54780cf6c0f540bb7d5a732f4536122d4b2bd45b35ce622b5b64b975 debian2.cpio.xz
fb6fce4c54780cf6c0f540bb7d5a732f4536122d4b2bd45b35ce622b5b64b975 debian.cpio.xz
diff --git a/mkosi/__init__.py b/mkosi/__init__.py
index 3f79095..6fc795a 100644
--- a/mkosi/__init__.py
+++ b/mkosi/__init__.py
@@ -3559,6 +3559,10 @@ def make_cpio(
root_dir = root / "usr" if args.usr_only else root
+
+ reset_timestamps = ["find", root_dir, "-mindepth", "1", "-execdir", "touch", "-hcd", "@0", "{}", "+"]
+ run(reset_timestamps)
+
with complete_step("Creating archive…"):
f: BinaryIO = cast(BinaryIO, tempfile.NamedTemporaryFile(dir=os.path.dirname(args.output), prefix=".mkosi-"))
@@ -3573,7 +3577,7 @@ def make_cpio(
assert cpio.stdin is not None
with spawn(compressor, stdin=cpio.stdout, stdout=f, delay_interrupt=False):
- for file in files:
+ for file in sorted(files):
cpio.stdin.write(os.fspath(file).encode("utf8") + b"\0")
cpio.stdin.close()
if cpio.wait() != 0:
@@ -5119,6 +5123,7 @@ def create_parser() -> ArgumentParserMkosi:
type=cast(Callable[[str], ManifestFormat], ManifestFormat.parse_list),
help="Manifest Format",
)
+ group.add_argument('--manifest', default=True, action=argparse.BooleanOptionalAction),
group.add_argument(
"-o", "--output",
help="Output image path",
@@ -7445,7 +7450,9 @@ def build_stuff(args: MkosiArgs) -> Manifest:
workspace = setup_workspace(args)
image = BuildOutput.empty()
- manifest = Manifest(args)
+ manifest = None
+ if args.manifest:
+ manifest = Manifest(args)
# Make sure tmpfiles' aging doesn't interfere with our workspace
# while we are working on it.
@@ -8137,7 +8144,8 @@ def run_verb(raw: argparse.Namespace) -> None:
if args.auto_bump:
bump_image_version(args)
- save_manifest(args, manifest)
+ if args.manifest:
+ save_manifest(args, manifest)
print_output_size(args)
diff --git a/mkosi/backend.py b/mkosi/backend.py
index 07f285e..d550ab3 100644
--- a/mkosi/backend.py
+++ b/mkosi/backend.py
@@ -436,6 +436,7 @@ class MkosiArgs:
architecture: str
output_format: OutputFormat
manifest_format: List[ManifestFormat]
+ manifest: bool
output: Path
output_dir: Optional[Path]
bootable: bool
What should the UX for reproducing mkosi
images be?
The current workflow I've implemented for the Arch images is this;
mkosi -d arch --reproducible -o arch.cpio.xz -t cpio --compress-output=xz build
mkosi -d arch --reproducible --manifest-file ./arch.cpio.xz.manifest -o arch.repro.cpio.xz -t cpio --compress-output=xz build
But would it make more sense to have a reproduce
subcommand and include more information in the manifests?
mkosi -d arch --reproducible -o arch.cpio.xz -t cpio --compress-output=xz build
mkosi reproduce ./arch.cpio.xz.manifest
Is there anything else we need to take care of or think of?
The second approach definitely makes more sense to me.
Unfortunately, we're probably going to run into the limits of our argument parsing again, since all arguments apply to all commands, which probably wouldn't make sense for a "reproduce" command.
The main thing that would need to be added to the manifest file is a serialized version of the config used to build the image.
Hmm, should we try to serialize the config into the manifest, or do we assume that the manifest
+ configuration is what is capable of reproducing the image?
I think serialising the config would be a good approach, although I'm a bit apprehensive to just dump it to json, because MkosiArgs
will most certainly still change down the line and then we might run into issues with missing or unexpected keys, when trying to recreate stuff. Some version field might be sensible here.
@keszybz Do you have input here? Since you started the manifest work you probably have thoughts where such extensions should go to.
I think we should do the rework discussed in #769, in a way that there's a few layers of clearly-separated config:
- config files + command-line args
- effective config with automatic extensions to the package lists, e.g. when we add some packages based on the selected distro, partition sizes that were selected, etc.
- effective package list resulting from the above config (i.e. what the manifest gathers currently)
And I'd (optionally) save all three in the manifest file. If we get the abstractions right, this shouldn't be any extra work, just serialization to json of a few dicts or dataclass objects. And this would give all the information to understand what was done and how to repeat it. How this information is to be used would be chosen by the "client" that is doing the repeat build, depending on the intended scenario.
Should we do the rework first, or would people be fine with me jamming a few new variables into manifest.json
to get some of the basic reprobuilds goals accomplished?
We can just mark any form of reproducible builds as experimental to avoid any form of commitment on the manifest format.
I'd be in favour of that.
@rphibel is doing some fundamental work to make serializing the config into the manifest easier, starting by splitting MkosiArgs into MkosiConfig and MkosiState