s5cmd icon indicating copy to clipboard operation
s5cmd copied to clipboard

How does s5cmd handle large downloads if object changes?

Open gudmundur opened this issue 2 years ago • 2 comments

Hi there, 👋🏻

I bumped into s5cmd recently, and I've been running a few experiments that are very promising. What I haven't been able to figure out from reading a little into the code, is what happens if an object in S3 is updated while s5cmd is downloading is in multiple parts? To give an example, let's say that a download of object X in 10 parts is half way done, a separate upload to S3 updates X, what will happen with the remaining 5 parts of the original X? Will s5cmd detect this? Will it silently download the 5 parts from the updated X? Or what are the semantics here?

Thanks!

gudmundur avatar Mar 22 '22 15:03 gudmundur

Hey 👋

We don't have any code block to prevent corruption mid-transfer. In fact, I'm not sure if AWS has a way to prevent this behaviour. Please see https://github.com/aws/aws-cli/issues/2321

igungor avatar Mar 22 '22 17:03 igungor

Thanks for the quick response @igungor. 🙇🏻‍♂️

I suspect this can be done with a versioned bucket by using the version ID from the first request as a reference point for the subsequent ones. Otherwise, I'm curious to see if object integrity can somehow be used to detect the update.

gudmundur avatar Mar 23 '22 16:03 gudmundur