artifacts icon indicating copy to clipboard operation
artifacts copied to clipboard

Expose blobs on URLs with proper mime type for browser consumption

Open awakecoding opened this issue 3 years ago • 3 comments

I want to use OCI Artifacts to store video files (remote desktop session recordings) efficiently, and one thing I would need is a way to play the videos files directly in a browser, without downloading them locally first. Since we can already declare a proper mime type for the artifacts (mediaType field), all that would be required is a blob URL that responds with the proper mime type, allowing the browser to recognize the contents and play it directly as it is being downloaded.

Using the proper mime type should make video files playable in the browser either by opening the URL directly, or by embedding the blob URL inside the HTML5 video tag.

My primary use case here is video files, but the same applies for audio files and images files, or anything that can be opened directly in a browser given the correct mime type.

awakecoding avatar Mar 18 '21 21:03 awakecoding

riffing a bit more: One of the values of an OCI distribution based service is the named/content addressable storage. The blobs are details, but not the “human” interaction model registry.io/namespace/artifact:tag

What would be interesting is how these names elements can be combined.

If a curl client want to pull binary, can it pass registry.io/namespace/binary:version

If the notation client is invoked, as a gate to a curl, it uses the same names reference: notation verify registry.io/namespace/binary:version

If a remote desktop video client wishes to get metadata, it can also use the same named reference.

The registry would serve a redirect to the content (blob) based on the type of request.

This model could preserve the common named reference, pointing at a manifest to describe the artifact. But also support the other scenarios like signatures, sboms and metadata.

Thoughts?

SteveLasker avatar Oct 30 '21 20:10 SteveLasker

The tag human notation is not flexible enough for this, it's probably better if we simply parse the manifests to find the blob URLs, and then construct associative "file URLs" that use the content address (digest) combined with a "human address" (file name + type) to pull the blob as a regular file that will be recognized by the browser.

The spec uses the following URL structure to pull blobs: /v2/<name>/blobs/<digest>

Here is what I suggest: /v2/<name>/files/<digest>/<filename>

So let's say you have a manifest that refers to a PDF presentation for which you now have the digest. You could pull the blob and open it in a PDF viewer, but if you try reading the blob directly in a browser, it will likely just download it instead of launching the built-in PDF viewer of the browser, and that's because it didn't have the correct mime type (application/pdf). Let's fix this with my suggestion:

/v2/<name>/files/<digest>/presentation.pdf

The OCI registry would use the last element of the URL as the file name, but serve the contents of the corresponding blob. With automatic mime types, ".pdf" can be served with "application/pdf" as the mime type, which should make it work inside the built-in PDF viewer of most browsers.

The last improvement to discuss is how to explicitly specify a different mime type instead of leaving it to default mime type detection based on the file name. We could add a query parameter, or a request header for this.

What do you think?

awakecoding avatar Oct 31 '21 15:10 awakecoding

It’s getting closer.

We do have this general request for curling urls from a few folks, including some internal Azure teams. I’m trying to find a way to meet the url requirements while staying aligned to some principles around the distribution spec. Some of them aren’t written in the specs but are standard implementation details.

The premise of how far the distribution spec diverges from these core concepts has been the source of much discussion, so I do want to recognize these challenges and try to keep expanding the capabilities to continue on the vision that distribution can be the base for most new package managers, while maintaining some core principals.

Blob URLs are neither fixed over time or tied to the same domain or URL as the artifact reference

A distribution instance has two endpoints:

  • A REST endpoint for auth and discovery
  • A data endpoint for blob content delivery.

A user references: wabbit-networks.io/net-monitor:v1 The blob content (layers for container images), may be served from 1234567.blobs.core.cloud.io

See an example of ACR Dedicated Data endpoints

Distribution clients know how to negotiate this series of requests. A standard and simplified “happy path” would be:

  1. A client requests an artifact (by tag or digest)
  2. The registry responds with a manifest
  3. The client evaluates the blobs defined in the manifest. The blobs are wrapped in descriptors.
  4. Based on the digests (in the descriptors), the client evaluates if it has any, already on the client.
  5. The client identifies the missing blobs and sends a list of requests to the server for urls for each blob.
  6. Based on various factors, different blob urls are returned. Two requests for the same manifest, or different manifests, even in the same repo, may return different blob urls.

Reasons for differences:

  • A geo-replicated registry may return regionalized data endpoints for a traffic routed registry endpoint. [registry].[region].data.azurecr.io. ( The wabbitnetworks.azurecr.io registry endpoint may have wabbitnetworks.eastus.data.azurecr.io and wabbitnetworks.westeu.data.azurecr.io data endpoints)
  • Similar to geo-/region replicated, zone routing is also done
  • For tax & trade compliance requirements, a global request must be served within that geo-fenced region (The was one of the requirements that led to mcr
    • When a paying customer purchases software from Australia, it must be billed and delivered from within that region.
  • Same as above, but some customers need their data limited to sovereign boundaries
  • A cdn registry may return a cdn backed blob url, where another registry or even a repo on the same registry may serve cloud blob urls directly. Docker Hub and other cloud providers have made cdn changes over time
  • Some cross-cloud registries have used blob url re-routing to deliver the expensive (network egress) blobs from within that clouds data center/region
  • Windows foreign layers use this model to serve windows layers from mcr.microsoft.com, regardless of where the manifest is located. (this is actually a problematic one we’re working to undo)

What your asking for is something I'm hoping we can solve. I’m just searching for a solution that gives the benefits above for using the same url to get supporting artifact types (signature, sboms, scan results) and stay true to the core capabilities of the distribution spec that has scaled to many scenarios.

What you, and others, are asking for is a way for the registry to redirect a request, rather than the client having to negotiate the manifest content.

Perhaps @sajayantony, @stevvooe or @jonjohnsonjr might have some ideas on how we can redirect requests, based on the mediaType in the header.

SteveLasker avatar Nov 01 '21 16:11 SteveLasker