traefik icon indicating copy to clipboard operation
traefik copied to clipboard

Can't upload docker images larger than 2GB via traefik 3.0 proxy

Open ieugen opened this issue 1 year ago • 7 comments

Welcome!

  • [X] Yes, I've searched similar issues on GitHub and didn't find any.
  • [X] Yes, I've searched similar issues on the Traefik community forum and didn't find any.

What did you do?

I'm using trafik 3 with docker swarm provider.

I've configured Sonatype nexus docker registry behind traefik proxy. I'm trying to push a docker image that has a layer of ~2,79GB . Pushing smaller images work but this fails.

I configured a buffer middle-ware for the server with 4GB and applied it for my service. It does not work.

I believe this might be a regression from Traefik 2.x line since we did not see this issue then.

Note that pushing to the docker registry directly, bypassing traefik proxy works regardless of image size. I did this by exposing the service on a host port, tagging the image with a new registry and pushing to that port directly.

        # 4GB max body for docker images - must apply to each service
        traefik.http.middlewares.limit.buffering.maxRequestBodyBytes: "4000000000"
        traefik.http.middlewares.limit.buffering.maxResponseBodyBytes: "4000000000" 

applied as 
        traefik.http.routers.registry_internal_https.middlewares: limit@swarm
        traefik.http.routers.registry_internal_https.service: registry_internal_https

Middleware is applied:

image

Docs: https://doc.traefik.io/traefik/middlewares/http/buffering/

Issues I believe are related to this:

  • https://github.com/vulcand/oxy/issues/113
  • https://github.com/traefik/traefik/pull/9750
  • https://github.com/traefik/traefik/pull/8945

What did you see instead?

On traefik logs I see errors:

infra_traefik.1.d5k0zertx3qq@infra1    | {"level":"info","time":"2024-05-20T08:48:37Z","message":"Starting provider *acme.Provider"}
infra_traefik.1.d5k0zertx3qq@infra1    | {"level":"info","providerName":"le.acme","acmeCA":"https://acme-v02.api.letsencrypt.org/directory","time":"2024-05-20T08:48:37Z","message":"Testing certificate renew..."}
infra_traefik.1.d5k0zertx3qq@infra1    | {"level":"error","entryPointName":"https","routerName":"registry_internal_https@swarm","middlewareName":"limit@swarm","middlewareType":"Buffer","time":"2024-05-20T08:49:46Z","message":"vulcand/oxy/buffer: error when reading request body, err: context canceled"}
infra_traefik.1.d5k0zertx3qq@infra1    | {"level":"error","entryPointName":"https","routerName":"registry_internal_https@swarm","middlewareName":"limit@swarm","middlewareType":"Buffer","time":"2024-05-20T08:51:01Z","message":"vulcand/oxy/buffer: error when reading request body, err: context canceled"}
infra_traefik.1.d5k0zertx3qq@infra1    | {"level":"error","entryPointName":"https","routerName":"registry_internal_https@swarm","middlewareName":"limit@swarm","middlewareType":"Buffer","time":"2024-05-20T08:52:21Z","message":"vulcand/oxy/buffer: error when reading request body, err: context canceled"}

On the docker client side, when pushing the image I get:

The push refers to repository [docker-internal.REDACTED]
5f70bf18a086: Preparing
ac9645206b4c: Preparing
REDACTED
53ebc4e827bc: Waiting
c380a1355aca: Waiting
2db7720a8970: Preparing
41db8fe8bac3: Waiting
629ca62fb7c7: Preparing
2db7720a8970: Waiting
629ca62fb7c7: Waiting
5f70bf18a086: Layer already exists
e7db524bd9f3: Pushed
53b64133eb2b: Pushed
ac9645206b4c: Pushed
a99668287a5d: Pushed
cfb0006ffc0a: Pushed
53ebc4e827bc: Layer already exists
c380a1355aca: Layer already exists
41db8fe8bac3: Layer already exists
2db7720a8970: Layer already exists
629ca62fb7c7: Layer already exists
46130376c420: Pushed
d064743dc7f5: Pushed
3622c3223711: Retrying in 5 seconds
3622c3223711: Retrying in 4 seconds
3622c3223711: Retrying in 3 seconds
3622c3223711: Retrying in 2 seconds
9828468c3c7f: Pushed
3622c3223711: Retrying in 1 second
3622c3223711: Retrying in 10 seconds
3622c3223711: Retrying in 9 seconds
3622c3223711: Retrying in 8 seconds
3622c3223711: Retrying in 7 seconds
3622c3223711: Retrying in 6 seconds
REDACTD
3622c3223711: Retrying in 10 seconds
3622c3223711: Retrying in 9 seconds
REDACTED
3622c3223711: Retrying in 1 second
3622c3223711: Retrying in 20 seconds
REDACTED
3622c3223711: Retrying in 5 seconds
3622c3223711: Retrying in 4 seconds
3622c3223711: Retrying in 3 seconds
3622c3223711: Retrying in 2 seconds
3622c3223711: Retrying in 1 second
unknown: Client Closed Request

What version of Traefik are you using?

{"level":"info","version":"3.0.0","time":"2024-05-20T08:48:37Z","message":"Traefik version 3.0.0 built on 2024-04-29T14:25:59Z"}

What is your environment & configuration?

Running as a docker container with docker swarm provider - version 26.1.2

If applicable, please paste the log output in DEBUG level

No response

ieugen avatar May 20 '24 09:05 ieugen

I deployed traefik 2.11.2 with the same configuration (adapted the config to match). It seems that traefik 2.11 does not work either. I tried with buffer middleware and without.

I also tried:

        # 4GB max body for docker images - must apply to each service
        traefik.http.middlewares.limit.buffering.maxRequestBodyBytes: "4000000000"
        traefik.http.middlewares.limit.buffering.maxResponseBodyBytes: "4000000000" 
        traefik.http.middlewares.limit.buffering.memResponseBodyBytes: "2000000" 
        traefik.http.middlewares.limit.buffering.memRequestBodyBytes: "2000000"     

ieugen avatar May 20 '24 10:05 ieugen

I can push directly to docker registry by exposing port on host network:

    ports:
      - published: 8083
        target: 8083
        protocol: tcp
        mode: host

and push directly to the service, bypassing traefik.

docker tag docker-REDACTED/REDACTED:dev-latest localhost:8083/REDACTED:dev-latest
docker push localhost:8083/REDACTED:dev-latest

ieugen avatar May 20 '24 10:05 ieugen

I deployed traefik 2.11.2 with the same configuration (adapted the config to match). It seems that traefik 2.11 does not work either.

@ieugen

Would you be able to test/validate with Traefik 2.11.1 ? I ask because I've seeing similar problems with large file uploads, which I see it only with version 2.11.2+

rickysarraf avatar May 20 '24 15:05 rickysarraf

hi @rickysarraf , yes. I can do that off business hours, tomorrow. Thanks for the hint.

ieugen avatar May 20 '24 17:05 ieugen

hi @rickysarraf : I can confirm 2.10.0 works. I tried with buffer limit and I don't believe it worked. Dropped the 4GB buffer limit and that seems to restore it. We can now push our large docker images :) I do hope to be able to do that with traefik 3.

ieugen avatar May 21 '24 07:05 ieugen

2.11.2 does not work. No buffer limit used on any case.

I tried 2.11.1 and I get the error bellow, does not work.

"}
infra_traefik.1.6c1ev49h0ui0@infra1    | {"level":"error","msg":"Error while Peeking first byte: read tcp 172.18.0.4:443-\u003e87.236.176.66:36949: i/o timeout","time":"2024-05-21T07:48:46Z"}
infra_traefik.1.6c1ev49h0ui0@infra1    | {"level":"error","msg":"Error while Peeking first byte: read tcp 172.18.0.4:443-\u003e188.166.26.88:57415: i/o timeout","time":"2024-05-21T07:48:49Z"}

I tried 2.11.0 and it does work.

ieugen avatar May 21 '24 08:05 ieugen

Possibly related to https://github.com/traefik/traefik/issues/10596

ieugen avatar May 21 '24 13:05 ieugen

Hey @ieugen,

Thanks for reaching out! At first glance, the problem you're experiencing is related to a change introduced recently: https://github.com/traefik/traefik/pull/10602

To keep the repository focused, we ask that all questions be asked in the community forum. It is pretty active, so you might find that your question has already been answered there.

If not, you can ask and get help from other community members pretty quickly.

I close the issue.

sdelicata avatar May 23 '24 13:05 sdelicata