image
image copied to clipboard
docker: support for requesting chunks without end offset
if the specified length for the range is set to -1, then request all the data possible by using the "Range:
the error I am seeing is that we pass the value -1
to
- How this is happening? I can’t see any c/storage caller setting this to -1 .
the problem is here https://github.com/containers/storage/blob/main/pkg/chunked/storage_linux.go#L1611-L1617
in some cases the blobSize is set to -1 from c/image, e.g. when pulling quay.io/libpod/alpine-with-seccomp:label
.
I could change the code to not use any range (if there are no ranges, then request the whole file), but I thought this would be more flexible. We can restrict it so that only the last chunk can have len=-1
OK, so this is
- a schema1 image (unknown layer size)
- the transparent conversion path (needs to read the full file)
-
ImageSourceSeekable
not having a “read all” method at all.
Pedantically, when the size is unknown, while GetDiffer
requires it (all three implementations currently do, and there is no “unknown” value documented, the locally correct thing to do is for PutBlobPartial
to fail without even trying.
If you do want the conversion path to work on these images… yes, I think adding the “length unknown” case to ImageSourceSeekable
/ BlobChunkAccessor
is the simplest approach. (Notably blobChunkAccessorProxy
is simpler when there is only one API to call, having to add a whole separate “no-partial download” case to that would not be worth it). So, document the field value, allow it only for the last chunk.
But, at a higher level, this is really another argument to say that the transparent conversion should not be via PutBlobPartial
/GetDiffer
at all: let it work like an ordinary non-chunked pull, letting c/image deal with the whole blob and digest verification, and then do the conversion all the way down in overlay’s ApplyDiff
.
OK, so this is
- a schema1 image (unknown layer size)
- the transparent conversion path (needs to read the full file)
ImageSourceSeekable
not having a “read all” method at all.Pedantically, when the size is unknown, while
GetDiffer
requires it (all three implementations currently do, and there is no “unknown” value documented, the locally correct thing to do is forPutBlobPartial
to fail without even trying.If you do want the conversion path to work on these images… yes, I think adding the “length unknown” case to
ImageSourceSeekable
/BlobChunkAccessor
is the simplest approach. (NotablyblobChunkAccessorProxy
is simpler when there is only one API to call, having to add a whole separate “no-partial download” case to that would not be worth it). So, document the field value, allow it only for the last chunk.But, at a higher level, this is really another argument to say that the transparent conversion should not be via
PutBlobPartial
/GetDiffer
at all: let it work like an ordinary non-chunked pull, letting c/image deal with the whole blob and digest verification, and then do the conversion all the way down in overlay’sApplyDiff
.
we need to be able to pull via the PutBlobPartial
when convert_images
is enabled. composefs needs it to calculate the digests and enable fs-verity for each file.
As far as c/image is concerned, with UncompressedDigest
known, the layer’s singleLayerIDComponent
is going to be the DiffID, and the ComposeFS usage is ~invisible to c/image. I naively imagine that whether the call stack says GetDiffer
or ApplyDiff
does not really matter for fs-verity and other work.
One change that does make a difference, however, is that GetDiffer
is parallelized over up to 6 layers concurrently, but layer commit is sequential. That might well be a very strong reason to do the conversion in GetDiffer
, not in the commit phase, I didn’t try to measure that.
If this does help, arguably the layer extraction should be similarly made concurrent for non-chunked/non-ComposeFS layers as well. But that redesign would really be a very different conversation (and we might end up deprecating non-ComposeFS layers before that).
I've tried adding a new method GetBlob
to retrieve the entire blob, but I think the current API is more flexible, with a minimal change.
I've updated the comments
thanks, addressed your comments in the last version
Seems like there is one comment by @mtrmac that was not addressed?
Seems like there is one comment by @mtrmac that was not addressed?
I thought I've addressed them all, which one is still missing?
https://github.com/containers/image/pull/2391/files#r1589601556
https://github.com/containers/image/pull/2391/files#r1589601556
Thanks. Fixed now