s5cmd
s5cmd copied to clipboard
File writes/syncs to local filesystems not atomic, delete original file data on download failure
Version: 2.0.0 (and earlier)
I have an application that is running sync in a loop looking for changes and syncing changed files from an S3 bucket to an EFS filesystem. Running into two issues related to this:
-
The S3 bucket contains several large objects that are compressed. If a change is detected and these files are downloaded, the file is in an inconsistent state for the duration of the download. Because of the way the downloads are done, the original file is clobbered, and the new data is written directly into that file. Any applications that try to read the data while the download is occuring fail (because the partial data cannot be uncompressed).
-
If, as in the case above, one of the compressed files is updated, and the download fails, the original file is no longer present, and applications are unable to read the correct data until the sync runs again and the file is re-downloaded.
This worked for me by making the doDownload method in command/cp.go create + write to a temporary file in the same directory, and then do a rename once the file is completely written. Happy to open a PR if this sounds like a reasonable approach.
Probably related to https://github.com/peak/s5cmd/issues/479
PR would be welcome, I've run into this too
This issue should be resolved with https://github.com/peak/s5cmd/pull/582.
If you see the same problem again on a new build, please feel free to re-open. Thank you.