oras icon indicating copy to clipboard operation
oras copied to clipboard

Add recursive directory support

Open lcarva opened this issue 3 months ago • 6 comments

What is the version of your ORAS CLI

No response

What would you like to be added?

Current Behavior

When using oras push, files are uploaded individually as blobs while directories are bundled into tarballs that are then uploaded as blobs. These blobs are wrapped in an Image Manifest.

The current approach makes it impossible to list the contents for a directory tarball or download a single file from the directory tarball without downloading the tarball in full.

Proposed Solution

Implement hierarchical directory representation using OCI Image Indexes in addition to OCI Image Manifests. To preserve backwards compatibility, this new behavior must be explicitly enabled via parameters/config.

Directory Structure Mapping

  • File-only directories: Represented as a single Image Manifest
  • Directory-only directories: Represented as an Image Index linking to other Image Indexes or Image Manifests
  • Mixed directories: Similar to "directory-only" but with an additional Image Manifest representing the directory files.

Example filesystem:

sample/
├── dirs-only
│   ├── assets
│   │   └── logo.png
│   └── config
│       └── settings.json
├── files-only
│   ├── config.txt
│   ├── data.json
│   └── readme.md
└── mixed
    ├── package.json
    ├── README.md
    ├── src
    │   └── main.js
    └── tests
        └── test.js

Example OCI references:

        OCI Image Indexes                      OCI Image Manifests                 OCI Blobs     
                                                                                                 
                                                                                                 
+---------+         +-----------+                  +--------+                     +----------+   
| sample  +---+---->| dirs-only +--------+-------->| assets +-------------------->| logo.png |   
+---------+   |     +-----------+        |         +--------+                     +----------+   
              |                          |                                                       
              |                          |                                                       
              |                          |         +--------+                   +---------------+
              |                          +-------->| config +------------------>| settings.json |
              |                                    +--------+                   +---------------+
              |                                                                                  
              |                                                                                  
              |                                  +------------+                  +------------+  
              +--------------------------------->| files-only +--------+-------->| config.txt |  
              |                                  +------------+        |         +------------+  
              |                                                        |                         
              |                                                        |                         
              |                                                        |          +-----------+  
              |                                                        +--------->| data.json |  
              |                                                        |          +-----------+  
              |                                                        |                         
              |                                                        |                         
              |                                                        |          +-----------+  
              |                                                        +--------->| readme.md |  
              |                                                                   +-----------+  
              |                                                                                  
              |                                                                                  
              |      +-------+                        +---+                      +--------------+
              +----->| mixed +-----------+----------->| . +------------+-------->| package.json |
                     +-------+           |            +---+            |         +--------------+
                                         |                             |                         
                                         |                             |                         
                                         |                             |                         
                                         |                             |           +-----------+ 
                                         |                             +---------->| README.md | 
                                         |                                         +-----------+ 
                                         |                                                       
                                         |                                                       
                                         |           +-----+                        +---------+  
                                         +---------->| src +----------------------->| main.js |  
                                         |           +-----+                        +---------+  
                                         |                                                       
                                         |                                                       
                                         |          +--------+                     +-----------+ 
                                         +--------->| tests  +-------------------->| test.json | 
                                                    +--------+                     +-----------+ 

Why is this needed for ORAS?

Motivation

This enhancement addresses specific requirements for representing YUM repositories as OCI Artifacts. YUM repositories containing hundreds of thousands of files present scalability challenges. For example, I observed that certain registries, Quay, can easily run into timeouts when processing Image Manifests with about 2,000 blobs due to integrity checks implemented by the registry.

Organizing files into subdirectories, some form of "sharding", maintains reasonable blob counts per Image Manifest.

See discussion for additional context.

This could be achieved today with wrapper scripts around oras, but I do think that coming to an agreement on how to represent this data in an OCI registry with the oras community would be valuable for anyone using OCI artifacts.

Are you willing to submit PRs to contribute to this feature?

  • [x] Yes, I am willing to implement it.

lcarva avatar Sep 26 '25 17:09 lcarva

What you have proposed can be achieved by

oras push --oci-layout demo:v1 `find sample -type f`

as the file name can be a path with folder segments.

The resulted manifest is

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.manifest.v1+json",
  "artifactType": "application/vnd.unknown.artifact.v1",
  "config": {
    "mediaType": "application/vnd.oci.empty.v1+json",
    "digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
    "size": 2,
    "data": "e30="
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:e9e1aabe0aae56cb1eba125e053b355aaf43680cf243ee7b518e1c897e5868c1",
      "size": 13,
      "annotations": {
        "org.opencontainers.image.title": "sample/dirs-only/config/settings.json"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:ab211233b6576dbb0f8b5826447eeac61e2a833a99ac5d788fbc1a174c3c6ce5",
      "size": 8,
      "annotations": {
        "org.opencontainers.image.title": "sample/dirs-only/assets/logo.png"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5",
      "size": 9,
      "annotations": {
        "org.opencontainers.image.title": "sample/mixed/README.md"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519",
      "size": 12,
      "annotations": {
        "org.opencontainers.image.title": "sample/mixed/package.json"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:58417e0f781b6656949d37258c8b9052ed266e2eb7a5163cad7b0863e6b2916a",
      "size": 7,
      "annotations": {
        "org.opencontainers.image.title": "sample/mixed/src/main.js"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:13876b4beb64b9f156474dc78f9c923952a7ca210d4507b6b3135bbe244f8a60",
      "size": 7,
      "annotations": {
        "org.opencontainers.image.title": "sample/mixed/tests/test.js"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:a9b5c214b62651a1af8e7f600485ee6b280c815745eabb52c06bbccb2397b5f8",
      "size": 10,
      "annotations": {
        "org.opencontainers.image.title": "sample/files-only/config.txt"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:99377c63fbe5425d01e6e128d8b6a4a9b3f2e18bb233155583875b8a65aef58f",
      "size": 9,
      "annotations": {
        "org.opencontainers.image.title": "sample/files-only/data.json"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar",
      "digest": "sha256:5a831ea67cf5cf8703b0de46901ab25bd191f56b320053be9332d9a3b0d01d15",
      "size": 9,
      "annotations": {
        "org.opencontainers.image.title": "sample/files-only/readme.md"
      }
    }
  ],
  "annotations": {
    "org.opencontainers.image.created": "2025-10-15T06:01:11Z"
  }
}

Here's the full console log:

$ tree sample
sample
├── dirs-only
│   ├── assets
│   │   └── logo.png
│   └── config
│       └── settings.json
├── files-only
│   ├── config.txt
│   ├── data.json
│   └── readme.md
└── mixed
    ├── README.md
    ├── package.json
    ├── src
    │   └── main.js
    └── tests
        └── test.js

8 directories, 9 files
$ oras push --oci-layout demo:v1 `find sample -type f`
✓ Uploaded  sample/mixed/package.json                                                  12/12  B 100.00%    1ms
  └─ sha256:7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519
✓ Uploaded  sample/dirs-only/config/settings.json                                      13/13  B 100.00%  398µs
  └─ sha256:e9e1aabe0aae56cb1eba125e053b355aaf43680cf243ee7b518e1c897e5868c1
✓ Uploaded  application/vnd.oci.empty.v1+json                                            2/2  B 100.00%    1ms
  └─ sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a
✓ Uploaded  sample/dirs-only/assets/logo.png                                             8/8  B 100.00%    1ms
  └─ sha256:ab211233b6576dbb0f8b5826447eeac61e2a833a99ac5d788fbc1a174c3c6ce5
✓ Uploaded  sample/mixed/README.md                                                       9/9  B 100.00%  233µs
  └─ sha256:b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5
✓ Uploaded  sample/mixed/src/main.js                                                     7/7  B 100.00%  113µs
  └─ sha256:58417e0f781b6656949d37258c8b9052ed266e2eb7a5163cad7b0863e6b2916a
✓ Uploaded  sample/mixed/tests/test.js                                                   7/7  B 100.00%  161µs
  └─ sha256:13876b4beb64b9f156474dc78f9c923952a7ca210d4507b6b3135bbe244f8a60
✓ Uploaded  sample/files-only/config.txt                                               10/10  B 100.00%  213µs
  └─ sha256:a9b5c214b62651a1af8e7f600485ee6b280c815745eabb52c06bbccb2397b5f8
✓ Uploaded  sample/files-only/data.json                                                  9/9  B 100.00%  321µs
  └─ sha256:99377c63fbe5425d01e6e128d8b6a4a9b3f2e18bb233155583875b8a65aef58f
✓ Uploaded  sample/files-only/readme.md                                                  9/9  B 100.00%  119µs
  └─ sha256:5a831ea67cf5cf8703b0de46901ab25bd191f56b320053be9332d9a3b0d01d15
✓ Uploaded  application/vnd.oci.image.manifest.v1+json                             2.36/2.36 KB 100.00%  574µs
  └─ sha256:3a930f47096cabd6db3d9c9cac8bddb39fa68c5145cf5e28660b7c22f784baf9
Pushed [oci-layout] demo:v1
ArtifactType: application/vnd.unknown.artifact.v1
Digest: sha256:3a930f47096cabd6db3d9c9cac8bddb39fa68c5145cf5e28660b7c22f784baf9
$ rm -r sample
$ oras pull --oci-layout demo:v1
✓ Pulled      sample/mixed/README.md                                                     9/9  B 100.00%  793µs
  └─ sha256:b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5
✓ Pulled      sample/dirs-only/config/settings.json                                    13/13  B 100.00%  552µs
  └─ sha256:e9e1aabe0aae56cb1eba125e053b355aaf43680cf243ee7b518e1c897e5868c1
✓ Pulled      sample/dirs-only/assets/logo.png                                           8/8  B 100.00%    1ms
  └─ sha256:ab211233b6576dbb0f8b5826447eeac61e2a833a99ac5d788fbc1a174c3c6ce5
✓ Pulled      sample/mixed/tests/test.js                                                 7/7  B 100.00%  121µs
  └─ sha256:13876b4beb64b9f156474dc78f9c923952a7ca210d4507b6b3135bbe244f8a60
✓ Pulled      sample/mixed/package.json                                                12/12  B 100.00%  101µs
  └─ sha256:7ae45ad102eab3b6d7e7896acd08c427a9b25b346470d7bc6507b6481575d519
✓ Pulled      sample/mixed/src/main.js                                                   7/7  B 100.00%  195µs
  └─ sha256:58417e0f781b6656949d37258c8b9052ed266e2eb7a5163cad7b0863e6b2916a
✓ Pulled      sample/files-only/data.json                                                9/9  B 100.00%   57µs
  └─ sha256:99377c63fbe5425d01e6e128d8b6a4a9b3f2e18bb233155583875b8a65aef58f
✓ Pulled      sample/files-only/config.txt                                             10/10  B 100.00%  110µs
  └─ sha256:a9b5c214b62651a1af8e7f600485ee6b280c815745eabb52c06bbccb2397b5f8
✓ Pulled      sample/files-only/readme.md                                                9/9  B 100.00%   43µs
  └─ sha256:5a831ea67cf5cf8703b0de46901ab25bd191f56b320053be9332d9a3b0d01d15
✓ Pulled      application/vnd.oci.image.manifest.v1+json                           2.36/2.36 KB 100.00%   57µs
  └─ sha256:3a930f47096cabd6db3d9c9cac8bddb39fa68c5145cf5e28660b7c22f784baf9
Pulled [oci-layout] demo:v1
Digest: sha256:3a930f47096cabd6db3d9c9cac8bddb39fa68c5145cf5e28660b7c22f784baf9
$ tree sample
sample
├── dirs-only
│   ├── assets
│   │   └── logo.png
│   └── config
│       └── settings.json
├── files-only
│   ├── config.txt
│   ├── data.json
│   └── readme.md
└── mixed
    ├── README.md
    ├── package.json
    ├── src
    │   └── main.js
    └── tests
        └── test.js

8 directories, 9 files

shizhMSFT avatar Oct 15 '25 06:10 shizhMSFT

@shizhMSFT, thank you for your response. Although your suggestion does work, and I failed to acknowledge it in the issue description, it doesn't address the challenge when dealing with thousands of files for a couple of reasons. First, passing every single file path as a command parameter will inevitably exceed ARG_MAX. Second, some registries are either going to timeout or downright reject an Image Manifest with thousands of entries. The "sharding" aspect of the proposal is a way of keeping each manifest at a controllable and reasonable size.

lcarva avatar Oct 15 '25 13:10 lcarva

dealing with thousands of files for a couple of reasons

That's interesting. I'm not sure storing complex file structures as OCI artifacts instead of in an object storage is a good idea or not but I'd like to leave the question to you.

The current approach makes it impossible to list the contents for a directory tarball or download a single file from the directory tarball without downloading the tarball in full.

I assume that your core requirements are

  1. List the files without downloading the entire tarball
  2. Fetch one file without downloading the entire tarball

In fact, it is also achievable by archiving everything in an uncompressed tarball, which is random read accessible.

An uncompressed tar file is a concatenation of a bunch of files.

graph LR
    A[File A<br>Header Block] --> B[File A<br>Data Block]
    B --> C[File B<br>Header Block]
    C --> D[File B<br>Data Block]
    D --> E[...]
    E --> F[End-of-Archive<br>Marker]

If we save the header positions to somewhere, we can list the files and fetch individual files without downloading the entire tarball easily with the similar implementation as package tarfs in oras-go. What do you think?

shizhMSFT avatar Oct 16 '25 08:10 shizhMSFT

There's one more core requirement:

  1. Leverage storage de-duplication on the registry.

If one of the files has been previously uploaded to the OCI repository, that file shouldn't have to be uploaded again to the same OCI repository. A concrete use case, and my motivation for pursuing this, is a YUM repo where a single package (RPM) may exist in multiple repositories. Consider the case of a YUM repo update. If I'm adding a single package to the YUM repo, I shouldn't have to upload all the other packages as well. The uncompressed tar file approach wouldn't meet this requirement IIUC.

I'm not sure storing complex file structures as OCI artifacts instead of in an object storage is a good idea or not but I'd like to leave the question to you.

I think it's a good idea given the ubiquity of OCI storage. Being able to represent such structure would significantly lower the entry barrier for creating YUM repos, for example.

lcarva avatar Oct 16 '25 18:10 lcarva

@sabre1041 do you have more context about this scenario?

FeynmanZhou avatar Oct 29 '25 00:10 FeynmanZhou

@sabre1041 do you have more context about this scenario?

After having a brief conversation with @lcarva , one of the key drivers behind this feature request lies in this discussion. @lcarva can certainly provide more details as well

sabre1041 avatar Nov 11 '25 22:11 sabre1041

What would help drive this conversation forward?

lcarva avatar Dec 16 '25 21:12 lcarva