new feature: Support OCI artifacts
Feature Description
I would love to see support in opendal for OCI artifact reading/extraction (see [1])
[1] https://oras.land/docs/quickstart
Problem and Solution
We have integrated support in jumpstarter for opendal as our storage layer for accessing storage, or passing presigned storage access to edge elements.
https://github.com/jumpstarter-dev/jumpstarter/tree/main/packages/jumpstarter-driver-opendal/jumpstarter_driver_opendal
handling OCI artifacts is becoming standard in the industry to transport device images, or firmware, and we would like to take advantage of that.
CC: @NickCao
Additional Context
We don't have capacity to work on this at the moment.
Are you willing to contribute to the development of this feature?
- [ ] Yes, I am willing to contribute to the development of this feature.
Thank you @mangelajo for this issue. I'm willing help make some progress here. First of all, I need to learn how we can map OCI to Opendal's calling convention.
OpenDAL creates a model that allows us to:
- Read data from
path - Write data to
path - List files from
path
To build a useful service, we should at least support read and list. Can we implement this over OCI artifacts?
One problem I can see with oci artifacts is that each artifact itself is likely a directory of files (encapsulated as a tarball), so we really have two layers of "directories" here, the oci registry namespace/image/tag triplet, and the directory tree within a single artifact. For starters we might focus on the operations within a single artifact, yet you cannot implement efficient list operation on tarballs.
This also helps tooling. My dive doesn't work because of a bug.
Do you strictly want the OCI image layout tarball? Registries, although similar and related, are not the same strictly. If it's only the image tarball, do you want to include estargz too?
Any movement on this?
@NickCao thoughts on this path model?
oci://{registry}/{repo}/@{ref}/{inner?}
registry = ghcr.io | registry-1.docker.io | ...
repo = namespace/name
ref = <tag> | sha256:<digest> (OCI "reference")
inner = optional path *inside* the unpacked rootfs
Registry + repo acts like the bucket field in S3.
We reuse the first path segment for the OCI reference (tag / digest)
A plain list("/") therefore returns tags and digests - matches what most users expect when they "list a repository."
Looks neat! Thanks for the idea.
How do you represent local registry (loaded in docker)?
What about metadata access? They are two kinds:
- Attestations, mostly a large JSON
- Annotations, e.g. manifests such as multi-platform index.
Since files are delta stored in layers, what do we do with a layer sha?
thoughts on this path model?
One suggestion: instead of using the oci:// schema, use container://, followed by a container transport reference as defined in https://github.com/containers/image/blob/main/docs/containers-transports.5.md
And: would this create ambiguity if ref is omitted (implying latest), or with multi level repo names (e.g example.com/foo/bar/baz/image)?