oras-go
oras-go copied to clipboard
Stop sending network requests in seek operation for fetching chunked blobs
If a oras-go user want to get a blob chunk from a remote target, he/she can
- fetch the blob io reader via a resolved descriptor
- seek the io reader to a target offset
- read out the chunked content
In the currently implementation, while calling the seek operation, a HTTP request will be sent to the remote registry. So if a user seek multiple time before reading out a single chunk, more than one network request will be sent out.
We should change the implementation to: maintain a position status while doing seek operation and only send out the ranged network request while reading out the chunked content.
Actually this is the same design used by common file systems: when reading out one chunk in a large file, fseek
will not cause any IO operation, only the fread
will.
What is in that HTTP request? Can you link to the relevant parts of the code?
What is in that HTTP request? Can you link to the relevant parts of the code?
Thanks for reminding. This is talking about the implementation of seekable reader utility in https://github.com/oras-project/oras-go/blob/2e7b65f1b60d11cd861ce6b68bc53e23a1fd9306/internal/httputil/seek.go#L65
Which is used in below code to let blob store fetch operation to support resumable pulling https://github.com/oras-project/oras-go/blob/2e7b65f1b60d11cd861ce6b68bc53e23a1fd9306/registry/remote/repository.go#L673-L679
Is this still on anyone's radar? I saw the behaviour as I'm working on resumable downloads and just removed it.
I saw the behaviour as I'm working on resumable downloads and just removed it.
Hi @dtroyer-salad , we have not put efforts on this perf enhancement because we were not aware of a real use case. Could you share your scenarios of resumable downloads?
Pulling large images over networks that we do not control has led to a number of occasions where downloads get 'stuck' on the a large layer, restarting the entire layer from scratch repeatedly. As a specific example, a ~20GB image had a single ~18GB layer that never completed downloading due to the restarts.
My current implementation has dropped using readSeekCloser altogether in the resume codepath in favor of just setting the Range header for the remaining blob bits not already on disk.
Another (partial) use case for io.ReadSeekCloser
is in a copy @Wwwsylvia . Oras-go currently does this...
https://github.com/oras-project/oras-go/blob/11d464f8432e77175bc9cda221ec2d797eac752c/copy.go#L347-L359
rc
is a io.ReadSeekCloser
if the server supports Range requests.
If the dst.Push()
for a remote repository fails due to a network issue it is never retried (even when using the retry client). It could be retried by seeking back to the start and retrying the HTTP request. In other words we could set req.GetBody
to seek to 0 and then the retry logic would work.
This same resiliency feature applies to #338