oras-go icon indicating copy to clipboard operation
oras-go copied to clipboard

Stop sending network requests in seek operation for fetching chunked blobs

Open qweeah opened this issue 2 years ago • 6 comments

If a oras-go user want to get a blob chunk from a remote target, he/she can

  1. fetch the blob io reader via a resolved descriptor
  2. seek the io reader to a target offset
  3. read out the chunked content

In the currently implementation, while calling the seek operation, a HTTP request will be sent to the remote registry. So if a user seek multiple time before reading out a single chunk, more than one network request will be sent out.

We should change the implementation to: maintain a position status while doing seek operation and only send out the ranged network request while reading out the chunked content.

Actually this is the same design used by common file systems: when reading out one chunk in a large file, fseek will not cause any IO operation, only the fread will.

qweeah avatar Apr 02 '22 10:04 qweeah

What is in that HTTP request? Can you link to the relevant parts of the code?

sparr avatar Jun 20 '23 16:06 sparr

What is in that HTTP request? Can you link to the relevant parts of the code?

Thanks for reminding. This is talking about the implementation of seekable reader utility in https://github.com/oras-project/oras-go/blob/2e7b65f1b60d11cd861ce6b68bc53e23a1fd9306/internal/httputil/seek.go#L65

Which is used in below code to let blob store fetch operation to support resumable pulling https://github.com/oras-project/oras-go/blob/2e7b65f1b60d11cd861ce6b68bc53e23a1fd9306/registry/remote/repository.go#L673-L679

qweeah avatar Jun 26 '23 08:06 qweeah

Is this still on anyone's radar? I saw the behaviour as I'm working on resumable downloads and just removed it.

dtroyer-salad avatar Mar 13 '24 14:03 dtroyer-salad

I saw the behaviour as I'm working on resumable downloads and just removed it.

Hi @dtroyer-salad , we have not put efforts on this perf enhancement because we were not aware of a real use case. Could you share your scenarios of resumable downloads?

Wwwsylvia avatar Mar 20 '24 12:03 Wwwsylvia

Pulling large images over networks that we do not control has led to a number of occasions where downloads get 'stuck' on the a large layer, restarting the entire layer from scratch repeatedly. As a specific example, a ~20GB image had a single ~18GB layer that never completed downloading due to the restarts.

My current implementation has dropped using readSeekCloser altogether in the resume codepath in favor of just setting the Range header for the remaining blob bits not already on disk.

dtroyer-salad avatar Mar 20 '24 15:03 dtroyer-salad

Another (partial) use case for io.ReadSeekCloser is in a copy @Wwwsylvia . Oras-go currently does this...

https://github.com/oras-project/oras-go/blob/11d464f8432e77175bc9cda221ec2d797eac752c/copy.go#L347-L359

rc is a io.ReadSeekCloser if the server supports Range requests.

If the dst.Push() for a remote repository fails due to a network issue it is never retried (even when using the retry client). It could be retried by seeking back to the start and retrying the HTTP request. In other words we could set req.GetBody to seek to 0 and then the retry logic would work.

This same resiliency feature applies to #338

ktarplee avatar Apr 15 '24 09:04 ktarplee