`Transfer-Encoding: chunked` responses fail
Hello,
I have been working on a project using the oras client (v2) against a repository hosted on ECR. We are storing an artifact with quite a lot of individual items (431 layers on the final manifest), which seems to cause ECR to chunk the response:
# REPO contains the API of the target repository, private in this case
❯ curl -X GET -I --user "AWS:$(aws ecr get-login-password)" "${REPO}/manifests/latest"
HTTP/1.1 200 OK
Content-Type: application/vnd.oci.image.manifest.v1+json
Docker-Distribution-Api-Version: registry/2.0
Sizes:
Date: Fri, 23 Sep 2022 09:14:00 GMT
Transfer-Encoding: chunked
The library currently explicitly checks for net/http.Response.ContentLength when validating the response, which is for these cases -1 and errors out: GET "${REPO}/manifests/latest": unknown response Content-Length, coming from registry/remote/repository.go
I tried to replicate the behavior with the open source registry image, but even with a manifest with >10000 layers it responds non-chunked, so this might unfortunately be a bit annoying to replicate fully locally.
@nima Could you take a look since it is an ECR issue?
Yes I can take a look at this; please assign it to me.
I've started working on this as of today and will post some updates in the coming week.
Testing suit is complete. Working on a repro now. The most number of layers I was able to create via a Dockerfile was ~128; beyond that it hits a limit.
docker image inspect 080565210187.dkr.ecr.local:5000/utnubu:multilayer | jq '.[0].RootFS.Layers|length'
124
This unfortunately doesn't reproduce the issue...
curl -X GET -I --user AWS:${AWS_PWD} https://080565210187.dkr.ecr.local:5000/v2/utnubu/manifests/multilayer
HTTP/2 200
content-type: application/vnd.docker.distribution.manifest.v2+json
docker-distribution-api-version: registry/2.0
sizes:
date: Sun, 16 Oct 2022 05:52:42 GMT
Comparing this to a basic image though, the headers returned have changed which I don't have an explanation for.
% curl -k -X GET -I --user AWS:${AWS_PWD} https://080565210187.dkr.ecr.local:5000/v2/utnubu/manifests/oras
HTTP/2 200
content-type: application/vnd.oci.image.manifest.v1+json
docker-distribution-api-version: registry/2.0
sizes:
content-length: 456
date: Sun, 16 Oct 2022 05:46:47 GMT
I'm hitting the same issue on ECR - this is what I get from curl:
HTTP/1.1 200 OK
Content-Type: application/vnd.docker.distribution.manifest.v2+json
Docker-Distribution-Api-Version: registry/2.0
Sizes:
Date: Sun, 16 Oct 2022 07:58:57 GMT
Transfer-Encoding: chunked
Testing suit is complete. Working on a repro now. The most number of layers I was able to create via a Dockerfile was ~128; beyond that it hits a limit.
fyi we did not create the artifact via Dockerfiles, we used the oras SDK directly.
Re the imposed layer limit of 127: https://github.com/docker/docs/issues/8230#issuecomment-468630187
@lbjarre do you have a quick recipe to generate the artifact via the oras SDK that I can use? Or alternatively provide such an image that I can use.
@lbjarre do you have a quick recipe to generate the artifact via the oras SDK that I can use? Or alternatively provide such an image that I can use.
package main
import (
"bytes"
"context"
"fmt"
"log"
v1 "github.com/opencontainers/image-spec/specs-go/v1"
"oras.land/oras-go/v2"
"oras.land/oras-go/v2/content"
"oras.land/oras-go/v2/content/memory"
"oras.land/oras-go/v2/registry/remote"
)
func main() {
ctx := context.Background()
mem := memory.New()
manifest, err := makeArtifact(ctx, mem, 200)
check(err)
tag := "large-artifact"
err = mem.Tag(ctx, manifest, tag)
check(err)
// TODO: setup auth to ECR.
repo, err := remote.NewRepository("")
check(err)
_, err = oras.Copy(ctx, mem, tag, repo, "", oras.DefaultCopyOptions)
check(err)
}
func makeArtifact(ctx context.Context, target content.Pusher, nLayers int) (v1.Descriptor, error) {
var layers []v1.Descriptor
for i := 0; i < nLayers; i++ {
data := []byte(fmt.Sprintf("layer %d", i))
desc := content.NewDescriptorFromBytes("application/text", data)
if err := target.Push(ctx, desc, bytes.NewReader(data)); err != nil {
return v1.Descriptor{}, err
}
layers = append(layers, desc)
}
return oras.Pack(ctx, target, layers, oras.PackOptions{})
}
func check(err error) {
if err != nil {
log.Fatal(err)
}
}
This is roughly what I used for my testing setup.
Thanks @lbjarre; I've adapted this for our environment. I've set the number of layers to 400, and here are the results...
% cd ~/Development/github/oras && go build multilayer.go && mv multilayer /tmp/ && cd - && /tmp/multilayer
~/Development/brazil/DoodleContainers/src/DoodleContainers
Creating layer 10 of 400...
Creating layer 20 of 400...
Creating layer 30 of 400...
...
Creating layer 390 of 400...
Creating layer 400 of 400...
&{200 OK 200 HTTP/2.0 2 0 map[Content-Type:[application/vnd.oci.image.manifest.v1+json] Date:[Thu, 20 Oct 2022 05:14:33 GMT] Docker-Distribution-Api-Version:[registry/2.0] Sizes:[]] {0xc000112600} -1 [] false false map[] 0xc00052a200 0xc0001d4420}
200
%
With the 400-layer manifest pushed, now trying the repro...
% curl -I -X GET --basic --user "AWS:${AWS_SECRET}" "https://080565210187.dkr.ecr.local:5000/v2/utnubu/manifests/fatty"
HTTP/2 200
content-type: application/vnd.oci.image.manifest.v1+json
docker-distribution-api-version: registry/2.0
sizes:
date: Thu, 20 Oct 2022 05:16:50 GMT
That is, I'm still unable to reproduce what you've reported here. Anything else you can think of that I might be missing?
Here's some validation for the 400-layer object existing in the registry...
% skopeo inspect --raw docker://080565210187.dkr.ecr.local:5000/utnubu:fatty|jq .
{
"schemaVersion": 2,
"mediaType": "application/vnd.oci.image.manifest.v1+json",
"config": {
"mediaType": "application/vnd.unknown.config.v1+json",
"digest": "sha256:44136fa355b3678a1146ad16f7e8649e94fb4fc21fe77e8310c060f61caaff8a",
"size": 2
},
"layers": [
{
"mediaType": "application/text",
"digest": "sha256:3ef3bd6c80bd7c673acc76fde924749be5cb8c7553fd17c2c3979f37a754f75b",
"size": 7
},
{
"mediaType": "application/text",
"digest": "sha256:54db133a109fd7f0d6eb72da16df1af078f2bf86e917ae3c780a13d811d6aa6f",
"size": 7
},
{
"mediaType": "application/text",
"digest": "sha256:0c6eea24ed5f3274df121af2d789e43839b3bcfb328db8deb86b46896d57be67",
"size": 7
},
...
{
"mediaType": "application/text",
"digest": "sha256:b2f744a156137ae4a5bec0b9e91f91e3250b2492ff206606af9c967d2e393187",
"size": 9
},
{
"mediaType": "application/text",
"digest": "sha256:1da268e8e05ab78ebc086c2f6f4ac28cb73d4a621aa1f12f5612ea32c3d77b4f",
"size": 9
},
{
"mediaType": "application/text",
"digest": "sha256:c4c794002aded41f9abc6f592f8b6d40f113fb684311708a144a83c1fd911727",
"size": 9
}
]
}
%
% skopeo inspect --raw docker://080565210187.dkr.ecr.local:5000/utnubu:fatty|jq '.layers|length'
400
%
Trying this again with the said 431 layers, in case that's the tipping point. I tried 500 but that was rejected for exceeding the maximum allowed.
It reproduces for me with a large (4.5 gigs) image and not so many layers @nima
I guess I'm a bit confused on how the manifest would turn to chunked, when the data that it references (and not it) increases in size. In your repro @yuvalturg, do you know if you're not getting chunked when the layers are tiny
Hmmm, seems like it does make a difference.
Here's a 4-layer, 32B/layer image...
% curl --http1.1 -X GET -I -u AWS:${password} https://080565210187.dkr.ecr.local:5000/v2/utnubu/manifests/l4s32
HTTP/1.1 200 OK
Connection: close
Content-Type: application/vnd.oci.image.manifest.v1+json
Docker-Distribution-Api-Version: registry/2.0
Sizes:
Date: Sat, 22 Oct 2022 01:39:33 GMT
Content-Length: 751
Here's a 43-layer, 1KiB/layer image...
% curl --http1.1 -X GET -I -u AWS:${password} https://080565210187.dkr.ecr.local:5000/v2/utnubu/manifests/l43s1024
HTTP/1.1 200 OK
Connection: close
Content-Type: application/vnd.oci.image.manifest.v1+json
Docker-Distribution-Api-Version: registry/2.0
Sizes:
Date: Sat, 22 Oct 2022 01:39:57 GMT
Transfer-Encoding: chunked
Note how the Content-Length drops off, and in comes the Transfer-Encoding: chunked.
As an aside, the reason that the previous repro above did not work, is that HTTP/2 does not support HTTP 1.1's chunked transfer encoding:
Note: HTTP/2 doesn’t support HTTP 1.1's chunked transfer encoding mechanism, as it provides its own, more efficient, mechanisms for data streaming.
@VishrutPatel's fix addresses the issue.
Using the same two tests from before the fix next. This first one, l4s32, we expect no change:
% curl --http1.1 -X GET -I -u AWS:${password} https://080565210187.dkr.ecr.local:5000/v2/utnubu/manifests/l4s32
HTTP/1.1 200 OK
Connection: close
Content-Length: 751
Content-Type: application/vnd.oci.image.manifest.v1+json
Docker-Distribution-Api-Version: registry/2.0
Sizes:
Date: Sat, 22 Oct 2022 02:05:34 GMT
But using l43s1024 we do expect a change; the reappearance of Content-Length, and the consequent disappearance of Transfer-Encoding:
% curl --http1.1 -X GET -I -u AWS:${password} https://080565210187.dkr.ecr.local:5000/v2/utnubu/manifests/l43s1024
HTTP/1.1 200 OK
Connection: close
Content-Length: 5747
Content-Type: application/vnd.oci.image.manifest.v1+json
Docker-Distribution-Api-Version: registry/2.0
Sizes:
Date: Sat, 22 Oct 2022 02:05:42 GMT
The FIY explanation of the root cause (why this happens for higher image layer counts and/or layer sizes). Thanks @VishrutPatel.
Testing of the fix is in-progress, no issues so far. This should be resolved by mid-week, and then deployed worldwide.
Update: Fix is being deployed WW; will communicate here when that's complete.
The WW deployment is currently sitting at 44%; I'll update this again when we get closer to (or at) a 100%.
[INFO] Today we are at ~ 58% coverage.
[INFO] Okay, now at 69% WW coverage. Note that we will (for unrelated reasons) cease further deployments for the remainder of the week. I'll re-enable the deployment on Monday 11/14.
[INFO] Deployments will commence again tomorrow.
[INFO] Deployments are complete; @lbjarre can you confirm if the issue is resolved for you?
I actually don't have the same testing setup left anymore, we solved our issue by just reducing the manifest size. However, if your testcases are working then I'm comfortable with calling this as resolved. Thanks for the help!