image-spec icon indicating copy to clipboard operation
image-spec copied to clipboard

New Image "Layer"

Open blaggacao opened this issue 2 years ago • 20 comments

I'm seeking advice on a long-held consideration to amend the OCI image spec with means to add an infinite number of content==location-addressed blobs to the file system.

Let's call these "Nix Store Paths", that reside in a special place on the (linux) filesystem under /nix/store/<hash>-<name>.

Because of the content hash being part of the location address, location and content adressess become interchangeable and conflicts in the location-address space never occur.

Hence, no "merging" is required, no overlay filesystem, and above all, no layer limit, which sets the stage for arbitrary deduplication.

Wait a sec; how so? Doesn't linking expect well-known location-addresses?

The trick why this works is that the ELF binaries are patched so that all linking is redirected to corresponding /nix/store/... paths, but that's an implementation detail and shall solely make this proposal plausible (for ELFs).

The two porposed mime types would "read" as follows:

application/vnd.nix.store.path.v1.nar
application/vnd.nix.store.path.v1.nar+gzip

An OCI runtime should be able to:

  • download the nar archive
  • extract it to the file system
  • repeat ad infinitum
  • combine these operations with classical oci layers

A NAR archive is a completely reproducible TAR variant, a go implementation can be found here: https://github.com/nix-community/go-nix/tree/master/pkg%2Fnar

Can somebody give me some pointers how I should approach this?

/cc @AkihiroSuda for the work on builtkit-nix - I was also trying to ping you on matrix, not sure if that's the right venue - https://github.com/AkihiroSuda/buildkit-nix

/cc @nlewo for a prototypical implementation of the fundamental synergies at a different layer of the stack with his skopeo patches over at https://github.com/nlewo/nix2container

blaggacao avatar Jun 04 '22 22:06 blaggacao

A NAR archive is a completely reproducible TAR variant

But +gzip might not be reproducible, right? We probably want to have a reproducible compression format before bringing NAR into OCI.

/cc @AkihiroSuda for the work on builtkit-nix - I was also trying to ping you on matrix, not sure if that's the right venue - https://github.com/AkihiroSuda/buildkit-nix

Thanks, my matrix ID is @akihirosuda:matrix.org but I'm more active on opencontainers.slack.com

AkihiroSuda avatar Jun 05 '22 06:06 AkihiroSuda

Is there a manifest definition for using these blobs? I'm curious if it's one manifest per blob, or if you are taking them as a group.

sudo-bmitch avatar Jun 05 '22 12:06 sudo-bmitch

But +gzip might not be reproducible, right?

NixOS sets GZIP="-n" to make sure that gzip files are reproducible. There is also a constant effort to make sure that the minimal iso fully reproducible.

See https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/compression/gzip/default.nix#L41= https://github.com/NixOS/nixpkgs/issues/86348

We probably want to have a reproducible compression format before bringing NAR into OCI.

nix already supports itself gzip, brotli, bzip2 and xz, so that shouldn't be a big issue.

Is there a manifest definition for using these blobs? I'm curious if it's one manifest per blob, or if you are taking them as a group.

Each store path entry has dependencies which themselves have other dependencies. You can either parse the drv files to get this information or the cli.

SuperSandro2000 avatar Jun 05 '22 20:06 SuperSandro2000

Is there a manifest definition for using these blobs? I'm curious if it's one manifest per blob, or if you are taking them as a group.

Each store path entry has dependencies which themselves have other dependencies. You can either parse the drv files to get this information or the cli.

Where I was going with the question is pushing a blob to a registry will result in the blob being deleted on the next garbage collection. There needs to be an OCI image manifest that points to the blob(s), and a tag pointing to that manifest. There's not even a media type on the blob push, that gets set in a descriptor in a manifest.

sudo-bmitch avatar Jun 05 '22 22:06 sudo-bmitch

Where I was going with the question is pushing a blob to a registry will result in the blob being deleted on the next garbage collection. There needs to be an OCI image manifest that points to the blob(s), and a tag pointing to that manifest. There's not even a media type on the blob push, that gets set in a descriptor in a manifest.

Ah got ya! In fact, we're prototyping a nix cache implementation that is also planned to be OCI-distribution compliant (read only) over at https://github.com/input-output-hk/spongix

I'm unfortunately not very familiar with the OCI terminology, yet. Hence, I was wondering if I need to start out defining opencontainers/artifacts for nix cache artifacts, before we could include them into the OCI-image spec?

In principle, it's pretty straight forward: each store path should be fetched in parallel, so I guess they would be different "layers". But this is exactly where I need guidance on.

blaggacao avatar Jun 05 '22 22:06 blaggacao

In fact, we're prototyping a nix cache implementation that is also planned to be OCI-distribution compliant (read only) over at https://github.com/input-output-hk/spongix

Okay, that makes sense. I was thinking you wanted to push these to any registry supporting distribution-spec. Making your own registry changes that completely. 👍

sudo-bmitch avatar Jun 06 '22 00:06 sudo-bmitch

NixOS sets GZIP="-n" to make sure that gzip files are reproducible.

Yes, but this requires the specific version of GNU gzip (and maybe the specific host CPU and --configure flags too) for reproduction of other header fields (https://datatracker.ietf.org/doc/html/rfc1952) and the deflate blocks.

I'd like to see a "formal" specification that is enough for implementing a reproducible compressor from scratch without peeking into the source code of GNU gzip.

If it is too difficult, probably we should clearly document the expected version of GNU gzip.

nix already supports itself gzip, brotli, bzip2 and xz, so that shouldn't be a big issue.

Any of them has the "formal" specification?

AkihiroSuda avatar Jun 06 '22 04:06 AkihiroSuda

Why is reproducible compression of importance? Compression is only used for transit and storage and can be turned into the real (and reproducible!) data at any point.

Compression is an implementation detail; a solution that compresses its data should behave the same as one that doesn't. The former just achieves its goal more efficiently.
We don't need to care about the exact compressed data, only what it decompresses to.

Atemu avatar Jun 06 '22 10:06 Atemu

Why is reproducible compression of importance? Compression is only used for transit and storage and can be turned into the real (and reproducible!) data at any point.

Unfortunately the CAS design of registries is based on the data in transit (typically compressed). We've looked at how that would be considered a storage and transit detail, and track the digest of the uncompressed data, but that's currently an unsolved problem.

sudo-bmitch avatar Jun 06 '22 10:06 sudo-bmitch

How do other layer types handle compression?

Do they already use a reproducible compressor we could adopt for NARs aswell?

Atemu avatar Jun 06 '22 12:06 Atemu

Other layer types don't handle reproducibility. You can have two different CAS entries for effectively the same content, but with different timestamps, compression artifacts, etc.

sudo-bmitch avatar Jun 06 '22 14:06 sudo-bmitch

I think there might be a misunderstanding here.

While we Nix people do strive for 100% reproducibility, output paths (/nix/store/<hash>-name) aren't guaranteed to be reproducible. A build ("realisation") is still impure in many ways.
They're therefore not content-addressed either; they're input-addressed. The "build recipe" determines the hash and it's known before the build even starts.

CA-derivations are coming as an experimental feature but they won't be the MO for a long time to come or even all purposes.

In Nix, you can therefore also have many different CAS entries for a given output path but you generally don't care which one you get; any one of them is fine.
The important bit here is that they can't ever conflict.

Atemu avatar Jun 06 '22 18:06 Atemu

@sudo-bmitch @AkihiroSuda I want to get my hands dirty, next. Where should I start? :smile: -- maybe to clear up the clouds, we should have a quick call, even?

blaggacao avatar Jun 06 '22 19:06 blaggacao

Can I get a slack invitation to opencontainers.slack.com for [email protected] ?

blaggacao avatar Jun 06 '22 19:06 blaggacao

@blaggacao

I want to get my hands dirty, next. Where should I start? smile -- maybe to clear up the clouds, we should have a quick call, even?

We've got a weekly meeting, and this week's agenda is open: https://opencontainers.org/community/overview/

Invite to the slack sent.

sudo-bmitch avatar Jun 06 '22 19:06 sudo-bmitch

@sudo-bmitch: https://github.com/opencontainers/image-spec/issues/922#issuecomment-1147320966

Unfortunately the CAS design of registries is based on the data in transit (typically compressed). We've looked at how that would be considered a storage and transit detail, and track the digest of the uncompressed data, but that's currently an unsolved problem.

Maybe we can have a new digest algo like sha256+gunzip

  • https://github.com/opencontainers/image-spec/issues/925

@Atemu: https://github.com/opencontainers/image-spec/issues/922#issuecomment-1147784921

While we Nix people do strive for 100% reproducibility, output paths (/nix/store/<hash>-name) aren't guaranteed to be reproducible.

How does the output paths relate to the OP? NARs are reproducible or not?

AkihiroSuda avatar Jun 08 '22 08:06 AkihiroSuda

Maybe we can have a new digest algo like sha256+gunzip

  • https://github.com/opencontainers/image-spec/issues/925

For reference, here's where distribution-spec is looking to solve it with content encoding headers, which feels a lot cleaner to me, but also risky for registries to support if there are existing clients that would pull large uncompressed blobs: https://github.com/opencontainers/distribution-spec/issues/235

sudo-bmitch avatar Jun 08 '22 09:06 sudo-bmitch

@AkihiroSuda the OP had a bit unclear wording around the Nix output path hash being able to address content which it is not. It addresses a desired result but the exact content of that result is not defined. (Obviously you could calculate a hash but that would only be the hash of one instance of an output path of which there could be multiple.)

NARs themselves are reproducible. Given the same content, they will have the same hash. The content itself (as "identified" by the Nix output path hash) is not necessarily reproducible however. I just wanted to make that clear.

Atemu avatar Jun 08 '22 10:06 Atemu

For contextual awareness following up on various discussions, here is an instance where current optimizations reach their limit, by pigeon-holing a large dependency tree into the layer limit: https://github.com/nlewo/nix2container/issues/27

blaggacao avatar Jun 10 '22 20:06 blaggacao

The input addressed (not contend addressed) nature essentially ties the manifest creation to a particular registry that actually holds the data.

For the time being and experimentation, I consider this to be absolutely fine, since it doesn't make any difference as the desired (and observable) functionality of a binary is still captured by the hash (@Atemu , please correct me if I miss something, here).

On the longer, though, nix is moving towards content addressability.

blaggacao avatar Jun 10 '22 20:06 blaggacao

I think we are getting unnecessarily into the weeds here.

OCI should not need to care how the "store paths" are chosen; that can be a black box. It can simply require that such paths to not conflict, while leaving how conflicts are avoided as a job for images themselves to deal with.

I also wouldn't mind changing /nix/store to something else, if that greases any wheels. The NAR format is also unimportant don't worry about it.

Ericson2314 avatar Nov 09 '22 13:11 Ericson2314

A non-conflicting predicate would equally work for Guix and others that generalize on these ideas (e.g. OSTree!? is what I heared being mentioned)

blaggacao avatar Nov 09 '22 14:11 blaggacao

Has there been any progress on this?

adrian-gierakowski avatar Aug 17 '23 07:08 adrian-gierakowski

This has been implemented here in nix-snapshotter. Introducing a new media type would've required native integration in container runtimes like containerd, so instead we opted to use annotations to reference the necessary Nix packages. This means no changes to the image-spec is necessary to support native Nix images. See here for low level details.

Essentially this replaces the API calls for fetching blobs with Nix protocols for fetching blobs from a Nix binary cache.

Regarding the /nix/store prefix, it's necessary to be included because if a Nix package is built in a different prefix like /other/nix/store, the hashes actually change because internally there's often references of the full Nix store path in binaries, shared objects, and other transitive dependencies. nix-snapshotter supports other store prefixes if you provide your own NixBuilder, and you can potentially support multiple nix store prefixes with the same nix-snapshotter. Just detect based on prefix and call out to the corresponding nix daemon.

elpdt852 avatar Sep 07 '23 09:09 elpdt852

Given this good news, I would be inclined to close this as completed. Looking into the far distant future, we may ask different questions such as: is there a way we can incentive adoption so that more and more deployments would naively support nix store paths, but that's more of a governance question.

Let me know if I should reopen.

-> nix-snapshotter.

blaggacao avatar Sep 07 '23 09:09 blaggacao