Strange behaviour of sync
Hej,
I recently noticed that my backup process running s5cmd v2.2.2 is skipping some directories which does not part of exclude patterns.
Below is the s5cmd that sync content of the parent directory to s3 bucket:
s5cmd --stat sync --include "*" --exclude "work/*" --exclude "test/*" --exclude "Partial/*" --exclude "*screen*" --exclude "*.pileup" "/storageA/*" s3://storageA/
An example folder and its contents at source:
ls /storageA/Omics/ready/
INFO.md
After the sync, I don't see the above directory in the destination:
s5cmd --stat ls s3://storageA/Omics/ready/
ERROR "ls s3://storageA/Omics/ready/": no object found
But works, if we specify one more level of directory in the commad as below: and this copies many files that the first command does not do.
s5cmd --stat sync --include "*" --exclude "work/*" --exclude "test/*" --exclude "Partial/*" --exclude "*screen*" --exclude "*.pileup" "/storageA/Omics/*" s3://storageA/Omics/
This looks quite strange! what could be the problem with the first sync command?
Any suggestions?
I've also noticed such partial copy behavior with the sync command when transferring many files (200k+) with the latest release v2.2.2-48f7e59, whole folders are simply not copied to the destination. The s5cmd command seems to stop before all the work is done with return code 0 and no visible error messages (even in trace or debug mode).
We are facing the same problem. The bug is here:
https://github.com/peak/s5cmd/blob/v2.2.2/storage/s3.go#L336-L350
Not sure why the tool is skipping files that are modified after but this at least could be configurable option for users to opt in or out.
What's Happening
- Time capture:
now = time.Now().UTC()is set during the first page of S3 listing - Aggressive filtering: Any object with
LastModified > nowgets skipped withcontinue - Missing objects: Recently modified objects/prefixes get filtered out
Root Cause
The intent was to avoid race conditions with objects created during listing, but the implementation is too aggressive. It skips legitimate existing objects that happen to have recent timestamps, causing the sync command to miss checking certain prefixes.
The bug is in storage/s3.go in all three listing functions: listObjectsV2(), listObjects(), and listObjectVersions().