oras-go icon indicating copy to clipboard operation
oras-go copied to clipboard

Should `oras.Copy()` follow manifests specified in the `layers` or the `blobs` field?

Open Wwwsylvia opened this issue 2 years ago • 5 comments

Currently, oras.Copy() follows all the successors of a node to copy all the sub-DAGs.

For manifests specified in the layers field of OCI Image manifests or the blobs field of OCI Artifact manifests, should oras.Copy() treat them as leaf nodes or follow them to copy the sub-DAGs?

Wwwsylvia avatar Jan 12 '23 14:01 Wwwsylvia

Scenario 1: Repository

Suppose there is such a DAG, where manifest B references manifest A as one of its layers, and manifest list C references A* (identical to A) as one of its manifests. When copying this DAG to a remote repository, how should oras.Copy() handle manifest A and A*?

Doubtlessly, manifest A* should be treated as a non-leaf node and should be pushed to the repository via the manifest endpoint. But how about manifest A? If it is treated as a leaf node, should it be pushed to the repository via the blob endpoint?

  • If so, manifest A and A* need to be pushed twice and may be stored separately in the blob storage and manifest storage in the remote repository.

  • If not, manifest A and A* will be pushed just once via the manifest endpoint. If manifest A is copied first, blob E won't get copied along with it and will never be copied, since manifest A* will be skipped for copy.

graph TD

A[Manifest A]
AS[Manifest A*]
B[Manifest B]
C[Manifest List C]
D[Manifest List D]
E[Blob E]


A -.-> E
AS --> E
B -- layers --> A
C -- manifests --> AS
D --> B
D --> C

Wwwsylvia avatar Jan 13 '23 14:01 Wwwsylvia

Scenario 2: OCI Layout

Suppose there is such a DAG, where manifest A is referenced by manifest B as a layer and is referenced by manifest list C as a manifest. When copying this DAG to an OCI layout, oras.Copy() will copy manifest A only once, whether or not it treats manifest A as a leaf node (to be copied along with manifest B), since OCI layout stores manifests and blobs in the same storage. But if manifest A is copied as a leaf node along with manifest B and this happens before manifest list C is copied, blob E will never get copied.

graph TD

A[Manifest A]
B[Manifest B]
C[Manifest List C]
D[Manifest List D]
E[Blob E]


A --> E
B -- layers --> A
C -- manifests --> A
D --> B
D --> C

Wwwsylvia avatar Jan 13 '23 14:01 Wwwsylvia

Scenario 3: Repository Double CASs

Suppose the below DAG is being copied to a remote repository, should manifest A be pushed via the manifest endpoint or via the blob endpoint? Or should it be pushed twice via both endpoints?

graph TD

A[Manifest A]
B[Manifest B]

B -- layers --> A
B -- subject --> A

Wwwsylvia avatar Jan 13 '23 15:01 Wwwsylvia

Interestingly, the docker buildx build command generates build caches like this: Putting layers in the manifests field of an OCI image index. When copying such structure to a remote repository, should oras.Copy() push these layers (specified as manifests) via the manifest endpoint or the blob endpoint? 🤔

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.oci.image.index.v1+json",
  "manifests": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1",
      "size": 32,
      "annotations": {
        "buildkit/createdat": "2023-01-13T07:49:09.921545067Z",
        "containerd.io/uncompressed": "sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:d74d7d17ce90514c5eed8068791ab9b1d58f355a367c6a87bd3e0e1dc8113500",
      "size": 105,
      "annotations": {
        "buildkit/createdat": "2023-01-13T07:49:09.864832789Z",
        "containerd.io/uncompressed": "sha256:601bb128dc20e9b8a296510b1c840d58dfd7d596ae1396d52e886753423c052c"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:df9b9388f04ad6279a7410b85cedfdcb2208c0a003da7ab5613af71079148139",
      "size": 2814559,
      "annotations": {
        "buildkit/createdat": "2023-01-13T07:48:28.219213701Z",
        "containerd.io/uncompressed": "sha256:4fc242d58285699eca05db3cc7c7122a2b8e014d9481f323bd9277baacfa0628"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:eb630b592770ba0b3982595e566c1027966cf6b9733c5fc1bf0794bf6bc2c9cd",
      "size": 3578366,
      "annotations": {
        "buildkit/createdat": "2023-01-13T07:49:09.693226029Z",
        "containerd.io/uncompressed": "sha256:3cb741a610a6253327467f4bb4e3de9397c36846b2407dc56992c04475ced968"
      }
    },
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:ed0f0d4a18721d4dc5d5d8ffb7eaeb0df00ab5d1001bfa594419a1b8dd5ffc09",
      "size": 2581904,
      "annotations": {
        "buildkit/createdat": "2023-01-13T07:48:34.097896893Z",
        "containerd.io/uncompressed": "sha256:6d69e1b372ea8a2e13783b213f4cf108be422a05740625a209a054b48c9a76cd"
      }
    },
    {
      "mediaType": "application/vnd.buildkit.cacheconfig.v0",
      "digest": "sha256:f06f3ad8ce85bcc973c15a11f419e7601e74db8db0e7af8d05587d24d77ffc83",
      "size": 2407
    }
  ]
}

Wwwsylvia avatar Jan 13 '23 15:01 Wwwsylvia

We may need to introduce a new method to return leaf successors and non-leaf successors separately, as a complement to content.Successors().

https://github.com/oras-project/oras-go/blob/76382aaa94873ad14fddacdbff0f5ed32f43c3aa/content/graph.go#L47-L106

Wwwsylvia avatar Jan 13 '23 15:01 Wwwsylvia