oras-go Stop sending network requests in seek operation for fetching chunked blobs

If a oras-go user want to get a blob chunk from a remote target, he/she can

fetch the blob io reader via a resolved descriptor
seek the io reader to a target offset
read out the chunked content

In the currently implementation, while calling the seek operation, a HTTP request will be sent to the remote registry. So if a user seek multiple time before reading out a single chunk, more than one network request will be sent out.

We should change the implementation to: maintain a position status while doing seek operation and only send out the ranged network request while reading out the chunked content.

Actually this is the same design used by common file systems: when reading out one chunk in a large file, fseek will not cause any IO operation, only the fread will.

Apr 02 '22 10:04 qweeah

What is in that HTTP request? Can you link to the relevant parts of the code?

Jun 20 '23 16:06 sparr

What is in that HTTP request? Can you link to the relevant parts of the code?

Thanks for reminding. This is talking about the implementation of seekable reader utility in https://github.com/oras-project/oras-go/blob/2e7b65f1b60d11cd861ce6b68bc53e23a1fd9306/internal/httputil/seek.go#L65

Which is used in below code to let blob store fetch operation to support resumable pulling https://github.com/oras-project/oras-go/blob/2e7b65f1b60d11cd861ce6b68bc53e23a1fd9306/registry/remote/repository.go#L673-L679

Jun 26 '23 08:06 qweeah

Is this still on anyone's radar? I saw the behaviour as I'm working on resumable downloads and just removed it.

Mar 13 '24 14:03 dtroyer-salad

I saw the behaviour as I'm working on resumable downloads and just removed it.

Hi @dtroyer-salad , we have not put efforts on this perf enhancement because we were not aware of a real use case. Could you share your scenarios of resumable downloads?

Mar 20 '24 12:03 Wwwsylvia

Pulling large images over networks that we do not control has led to a number of occasions where downloads get 'stuck' on the a large layer, restarting the entire layer from scratch repeatedly. As a specific example, a ~20GB image had a single ~18GB layer that never completed downloading due to the restarts.

My current implementation has dropped using readSeekCloser altogether in the resume codepath in favor of just setting the Range header for the remaining blob bits not already on disk.

Mar 20 '24 15:03 dtroyer-salad

Another (partial) use case for io.ReadSeekCloser is in a copy @Wwwsylvia . Oras-go currently does this...

https://github.com/oras-project/oras-go/blob/11d464f8432e77175bc9cda221ec2d797eac752c/copy.go#L347-L359

rc is a io.ReadSeekCloser if the server supports Range requests.

If the dst.Push() for a remote repository fails due to a network issue it is never retried (even when using the retry client). It could be retried by seeking back to the start and retrying the HTTP request. In other words we could set req.GetBody to seek to 0 and then the retry logic would work.

This same resiliency feature applies to #338

Apr 15 '24 09:04 ktarplee

oras-go oras-go copied to clipboard

Stop sending network requests in seek operation for fetching chunked blobs

oras-go
oras-go copied to clipboard