moby icon indicating copy to clipboard operation
moby copied to clipboard

Resumable uploads to the registry

Open SamSaffron opened this issue 11 years ago • 17 comments

Ever since we moved to Docker as the official engine for deploying Discourse I have noticed a few support issues where the registry hangs us large transfers.

The gist of the issue is that some crazy VPNs and weird connections will terminate large image downloads mid way. This leaves them needing to transfer the entire payload a second time.

Proposed solution

Chunked upload and download.

  • Every image on the registry will contain a manifest, the manifest file contains a list of SHA1 hashes of image chunks, one per 512KB, the manifest itself will also be hashed so to ensure you get a kosher manifest.
  • When client downloads an images they will pull chunks, and validate chunks against the hashes in the manifest. Each correctly downloaded chunk should be stored in a tmp folder that is restricted in size. (say 1GB). If the tmp folder grows beyond that size it should remove oldests chunks.
  • When downloading chunks it should always check local tmp folder first. That way if you terminate a download halfway you will be able to pick up mid way.
  • Docker, out of the box will download 3 chunks concurrently (configurable either globally or locally to session)
  • A similar algorithm to be implemented on registry side so uploads can be chunked and resumed.

I feel such a solution will ease adoption of Docker in areas where internet connectivity is not spectacular, and heavily reduce load on registries.

The chunked solution also allows you to easily round robin on the registry side to increase reliability. Also makes it easier to handle mirroring using CDNs using origin pull.

Thoughts?

SamSaffron avatar Mar 26 '14 23:03 SamSaffron

I would propose we raise the image chunk up around say 1-3 Megabytes? Let's give the Kernel a chance to slide the window far enough that it can actually push data as fast as possible.

Also if we're going this way we should likely thread the chunk pushing (3 seems small; Some browsers do 4 up to 8, we don't want to ddos ourselves or slow our selves down either)

damm avatar Mar 26 '14 23:03 damm

1-3MB would be fine, but keep in mind that with tcp/http keepalive you will keep growing tcp window size across multiple chunks.

Concurrency should determined solely based on capacity, I guess server should be allowed to override it and tell clients to back down.

SamSaffron avatar Mar 26 '14 23:03 SamSaffron

Right but capacity is hard; that's where flow control gets in.

damm avatar Mar 26 '14 23:03 damm

Which app do you use for fetching images in docker? Here, we don't have problem on downloading large files using known download manager such as wget, aria2c, xdman, etc. +

perpi avatar Mar 28 '14 14:03 perpi

Image pulling is being fixed as part of #2461.

unclejack avatar Mar 28 '14 14:03 unclejack

@unclejack So, why I get this error:

WARNING: No swap limit support
Unable to find image 'samsaffron/discourse:0.1.2' locally
Pulling repository samsaffron/discourse
9dfbb44c55ff: Error pulling image (0.1.2) from samsaffron/discourse, read tcp 198.41.189.230:443: connection reset by peer 
8dbd9e392a96: Download complete 
21a54dd8e905: Download complete 
535e9f84ec37: Error downloading dependent layers 
2014/03/30 23:14:13 Could not find repository on any of the indexed registries.
Your Docker installation is not working correctly
See: https://meta.discourse.org/t/docker-error-on-bootstrap/13657/18?u=sam

?

perpi avatar Mar 30 '14 18:03 perpi

Yep, I'm still seeing this in 0.9.1. If you keep trying, it will download fine, but you chew through a bit of bandwidth trying repeatedly until it works.

vagrant@gentoo ~ $ docker pull d11wtq/redis
Pulling repository d11wtq/redis
ea76bcf23770: Error pulling image (latest) from d11wtq/redis, unexpected EOF
pected EOF 6: Download complete
fb65bcbb3dfd: Download complete
7181e4a9197f: Download complete
63c411d0656d: Download complete
c270a1a4f4db: Download complete
f5730325c9da: Download complete
ffd8bd48f3cf: Download complete
65277b5346cc: Error downloading dependent layers
2014/03/31 23:45:48 Could not find repository on any of the indexed registries.

d11wtq avatar Mar 31 '14 23:03 d11wtq

I like the proposed solution to this problem. Seems a lot like how Bit Torrent downloads files in chunks.

d11wtq avatar Mar 31 '14 23:03 d11wtq

@diff- That problem is being worked on for issue #2461.

All pull related issues with errors which contain "EOF" in them should be discussed in #2461. I'll change the title of this topic to make it clear this issue is going to be just for push.

unclejack avatar Apr 01 '14 00:04 unclejack

Please implement this if possible. I've got a ~660MB image that is failing around 50-100MB through the upload every time. Resumable uploads would be a huge help.

scarolan avatar Apr 04 '14 16:04 scarolan

Proposed solution seems good to me. I'd love to see this make it into Docker soon.

davidcelis avatar Apr 09 '14 00:04 davidcelis

+1

defender avatar Dec 30 '14 09:12 defender

@aaronlehmann can we close this now that we have resumable upload/download?

runcom avatar Apr 01 '16 09:04 runcom

@runcom: We only have resumable download at present, not resumable upload.

aaronlehmann avatar Apr 01 '16 16:04 aaronlehmann

Right :)

runcom avatar Apr 01 '16 16:04 runcom

I'm having lots of problems with this - it takes me several days to complete a docker push.

It would be a great fix if it was possible to just split the upload files.

I have a timeout every 19 minutes. If I could split the bigger files, they could complete before the timeout.

ghost avatar Oct 21 '16 09:10 ghost

Has anyone made any progress on this?

LiuShuaiyi avatar Sep 14 '22 18:09 LiuShuaiyi