moby
moby copied to clipboard
Resumable uploads to the registry
Ever since we moved to Docker as the official engine for deploying Discourse I have noticed a few support issues where the registry hangs us large transfers.
The gist of the issue is that some crazy VPNs and weird connections will terminate large image downloads mid way. This leaves them needing to transfer the entire payload a second time.
Proposed solution
Chunked upload and download.
- Every image on the registry will contain a manifest, the manifest file contains a list of SHA1 hashes of image chunks, one per 512KB, the manifest itself will also be hashed so to ensure you get a kosher manifest.
- When client downloads an images they will pull chunks, and validate chunks against the hashes in the manifest. Each correctly downloaded chunk should be stored in a tmp folder that is restricted in size. (say 1GB). If the tmp folder grows beyond that size it should remove oldests chunks.
- When downloading chunks it should always check local tmp folder first. That way if you terminate a download halfway you will be able to pick up mid way.
- Docker, out of the box will download 3 chunks concurrently (configurable either globally or locally to session)
- A similar algorithm to be implemented on registry side so uploads can be chunked and resumed.
I feel such a solution will ease adoption of Docker in areas where internet connectivity is not spectacular, and heavily reduce load on registries.
The chunked solution also allows you to easily round robin on the registry side to increase reliability. Also makes it easier to handle mirroring using CDNs using origin pull.
Thoughts?
I would propose we raise the image chunk up around say 1-3 Megabytes? Let's give the Kernel a chance to slide the window far enough that it can actually push data as fast as possible.
Also if we're going this way we should likely thread the chunk pushing (3 seems small; Some browsers do 4 up to 8, we don't want to ddos ourselves or slow our selves down either)
1-3MB would be fine, but keep in mind that with tcp/http keepalive you will keep growing tcp window size across multiple chunks.
Concurrency should determined solely based on capacity, I guess server should be allowed to override it and tell clients to back down.
Right but capacity is hard; that's where flow control gets in.
Which app do you use for fetching images in docker? Here, we don't have problem on downloading large files using known download manager such as wget, aria2c, xdman, etc. +
Image pulling is being fixed as part of #2461.
@unclejack So, why I get this error:
WARNING: No swap limit support
Unable to find image 'samsaffron/discourse:0.1.2' locally
Pulling repository samsaffron/discourse
9dfbb44c55ff: Error pulling image (0.1.2) from samsaffron/discourse, read tcp 198.41.189.230:443: connection reset by peer
8dbd9e392a96: Download complete
21a54dd8e905: Download complete
535e9f84ec37: Error downloading dependent layers
2014/03/30 23:14:13 Could not find repository on any of the indexed registries.
Your Docker installation is not working correctly
See: https://meta.discourse.org/t/docker-error-on-bootstrap/13657/18?u=sam
?
Yep, I'm still seeing this in 0.9.1. If you keep trying, it will download fine, but you chew through a bit of bandwidth trying repeatedly until it works.
vagrant@gentoo ~ $ docker pull d11wtq/redis
Pulling repository d11wtq/redis
ea76bcf23770: Error pulling image (latest) from d11wtq/redis, unexpected EOF
pected EOF 6: Download complete
fb65bcbb3dfd: Download complete
7181e4a9197f: Download complete
63c411d0656d: Download complete
c270a1a4f4db: Download complete
f5730325c9da: Download complete
ffd8bd48f3cf: Download complete
65277b5346cc: Error downloading dependent layers
2014/03/31 23:45:48 Could not find repository on any of the indexed registries.
I like the proposed solution to this problem. Seems a lot like how Bit Torrent downloads files in chunks.
@diff- That problem is being worked on for issue #2461.
All pull related issues with errors which contain "EOF" in them should be discussed in #2461. I'll change the title of this topic to make it clear this issue is going to be just for push.
Please implement this if possible. I've got a ~660MB image that is failing around 50-100MB through the upload every time. Resumable uploads would be a huge help.
Proposed solution seems good to me. I'd love to see this make it into Docker soon.
+1
@aaronlehmann can we close this now that we have resumable upload/download?
@runcom: We only have resumable download at present, not resumable upload.
Right :)
I'm having lots of problems with this - it takes me several days to complete a docker push.
It would be a great fix if it was possible to just split the upload files.
I have a timeout every 19 minutes. If I could split the bigger files, they could complete before the timeout.
Has anyone made any progress on this?