opendal icon indicating copy to clipboard operation
opendal copied to clipboard

new feature: Support OCI artifacts

Open mangelajo opened this issue 9 months ago • 6 comments

Feature Description

I would love to see support in opendal for OCI artifact reading/extraction (see [1])

[1] https://oras.land/docs/quickstart

Problem and Solution

We have integrated support in jumpstarter for opendal as our storage layer for accessing storage, or passing presigned storage access to edge elements.

https://github.com/jumpstarter-dev/jumpstarter/tree/main/packages/jumpstarter-driver-opendal/jumpstarter_driver_opendal

handling OCI artifacts is becoming standard in the industry to transport device images, or firmware, and we would like to take advantage of that.

CC: @NickCao

Additional Context

We don't have capacity to work on this at the moment.

Are you willing to contribute to the development of this feature?

  • [ ] Yes, I am willing to contribute to the development of this feature.

mangelajo avatar Mar 05 '25 15:03 mangelajo

Thank you @mangelajo for this issue. I'm willing help make some progress here. First of all, I need to learn how we can map OCI to Opendal's calling convention.

OpenDAL creates a model that allows us to:

  • Read data from path
  • Write data to path
  • List files from path

To build a useful service, we should at least support read and list. Can we implement this over OCI artifacts?

Xuanwo avatar Mar 05 '25 17:03 Xuanwo

One problem I can see with oci artifacts is that each artifact itself is likely a directory of files (encapsulated as a tarball), so we really have two layers of "directories" here, the oci registry namespace/image/tag triplet, and the directory tree within a single artifact. For starters we might focus on the operations within a single artifact, yet you cannot implement efficient list operation on tarballs.

NickCao avatar Mar 05 '25 17:03 NickCao

This also helps tooling. My dive doesn't work because of a bug.

Do you strictly want the OCI image layout tarball? Registries, although similar and related, are not the same strictly. If it's only the image tarball, do you want to include estargz too?

erickguan avatar Mar 06 '25 12:03 erickguan

Any movement on this?

@NickCao thoughts on this path model?

oci://{registry}/{repo}/@{ref}/{inner?}

registry  = ghcr.io | registry-1.docker.io | ...
repo      = namespace/name
ref       = <tag> | sha256:<digest>      (OCI "reference")
inner     = optional path *inside* the unpacked rootfs

Registry + repo acts like the bucket field in S3.

We reuse the first path segment for the OCI reference (tag / digest)

A plain list("/") therefore returns tags and digests - matches what most users expect when they "list a repository."

PSU3D0 avatar May 31 '25 17:05 PSU3D0

Looks neat! Thanks for the idea.

How do you represent local registry (loaded in docker)?

What about metadata access? They are two kinds:

  1. Attestations, mostly a large JSON
  2. Annotations, e.g. manifests such as multi-platform index.

Since files are delta stored in layers, what do we do with a layer sha?

erickguan avatar Jun 02 '25 20:06 erickguan

thoughts on this path model?

One suggestion: instead of using the oci:// schema, use container://, followed by a container transport reference as defined in https://github.com/containers/image/blob/main/docs/containers-transports.5.md

And: would this create ambiguity if ref is omitted (implying latest), or with multi level repo names (e.g example.com/foo/bar/baz/image)?

NickCao avatar Jun 02 '25 20:06 NickCao