feat: support multiple hashing algorithms (blake3)
Currently, umoci defaults to sha256.
OCI community is moving to multi-hash, in particular blake3
https://github.com/opencontainers/image-spec/pull/1240
Here is a PR identifying the various touch points.
https://github.com/project-stacker/umoci/pull/12
Yeah I saw the image-spec discussion. I suspect we will need to rework a few of the core CAS APIs to make it properly configurable...
PutBlob() and PutBlobJSON() need an extra param (whatever way you want to achieve it)
There's also all of the mutate and main umoci APIs, which already have a lot of arguments so we really should rework how adding config arguments works... I'll add this to 0.6.
@cyphar do you anticipate a github.com/opencontainers/umoci/v2 after these changes?
@rchincha There are already breaking changes slated for 0.5, but my current view is that we won't do a /v2 API split for the following reasons:
- umoci is still pre-v1, so -- from a SemVer perspective -- breaking changes are still allowed. Now, I have previously made agreements with some users that the CLI API is stable and will not be subject to breaking changes, but the Go API is still unstable from my view (and is documented as such).
- Splitting the API to
/v2will break an downstream users' builds just as much as not updating to/v2, so I don't see a strong reason for it other than it signalling that there are breaking API changes (but those are already mentioned in the changelog). The only reason I see for having a/v2is if it is needed from a SemVer perspective and/or we plan to maintain the old pre-/v2API. - I think having
/v2without umoci itself being taggedv2.xwill be more confusing to users.
That being said, I'm open to being convinced to use /v2. The only thing is that means that v0.5 will get /v2 and v0.6 will get /v3. Is churning the version number this way actually preferable for downstream users?
WDYT @tych0 @hallyn?
@rchincha There are already breaking changes slated for
0.5, but my current view is that we won't do a/v2API split for the following reasons:
- umoci is still pre-v1, so -- from a SemVer perspective -- breaking changes are still allowed. Now, I have previously made agreements with some users that the CLI API is stable and will not be subject to breaking changes, but the Go API is still unstable from my view (and is documented as such).
- Splitting the API to
/v2will break an downstream users' builds just as much as not updating to/v2, so I don't see a strong reason for it other than it signalling that there are breaking API changes (but those are already mentioned in the changelog). The only reason I see for having a/v2is if it is needed from a SemVer perspective and/or we plan to maintain the old pre-/v2API.- I think having
/v2without umoci itself being taggedv2.xwill be more confusing to users.That being said, I'm open to being convinced to use
/v2. The only thing is that means thatv0.5will get/v2andv0.6will get/v3. Is churning the version number this way actually preferable for downstream users?
Version changes only if ABI breakage of course.
(This is also related to https://github.com/opencontainers/umoci/issues/323, btw.)
Just a note to myself -- this will need auto-selection logic, similar to our current compression auto-selection logic (though I think the most sane default would be to use the hash used in index.json for the manifest -- since that hash is the main hash protecting the whole image). And we need to consider whether we want to have restrictions on what hash algorithms are allowed, or should we allow any algorithm registered with github.com/opencontainers/go-digest to be used? (This obviously means we will need some sanitation to avoid someone registering an algorithm called ../../../../../../../../etc.)
Just a note to myself -- this will need auto-selection logic, similar to our current compression auto-selection logic (though I think the most sane default would be to use the hash used in
index.jsonfor the manifest -- since that hash is the main hash protecting the whole image). And we need to consider whether we want to have restrictions on what hash algorithms are allowed, or should we allow any algorithm registered withgithub.com/opencontainers/go-digestto be used? (This obviously means we will need some sanitation to avoid someone registering an algorithm called../../../../../../../../etc.)
Sounds like a reasonable default. At the same time, need to ability to pick and choose.
BTW, IMO once we have a multi-hash world, it will inevitably raise compatibility concerns which we have not anticipated.
Well, I think for generation we should print a warning for anything other than sha256, but reading-wise I think we're fine. The main compatibility concern will probably come from registries.