s5cmd icon indicating copy to clipboard operation
s5cmd copied to clipboard

Strange behaviour of sync

Open praveenraj2018 opened this issue 1 year ago • 2 comments

Hej,

I recently noticed that my backup process running s5cmd v2.2.2 is skipping some directories which does not part of exclude patterns.

Below is the s5cmd that sync content of the parent directory to s3 bucket:

s5cmd --stat sync --include "*" --exclude "work/*" --exclude "test/*" --exclude "Partial/*" --exclude "*screen*" --exclude "*.pileup" "/storageA/*" s3://storageA/

An example folder and its contents at source:

ls /storageA/Omics/ready/
INFO.md

After the sync, I don't see the above directory in the destination:

s5cmd --stat ls s3://storageA/Omics/ready/
ERROR "ls s3://storageA/Omics/ready/": no object found

But works, if we specify one more level of directory in the commad as below: and this copies many files that the first command does not do.

s5cmd --stat sync --include "*" --exclude "work/*" --exclude "test/*" --exclude "Partial/*" --exclude "*screen*" --exclude "*.pileup" "/storageA/Omics/*" s3://storageA/Omics/

This looks quite strange! what could be the problem with the first sync command?

Any suggestions?

praveenraj2018 avatar May 13 '24 20:05 praveenraj2018

I've also noticed such partial copy behavior with the sync command when transferring many files (200k+) with the latest release v2.2.2-48f7e59, whole folders are simply not copied to the destination. The s5cmd command seems to stop before all the work is done with return code 0 and no visible error messages (even in trace or debug mode).

thiell avatar Jun 14 '24 04:06 thiell

We are facing the same problem. The bug is here:

https://github.com/peak/s5cmd/blob/v2.2.2/storage/s3.go#L336-L350

Not sure why the tool is skipping files that are modified after but this at least could be configurable option for users to opt in or out.

What's Happening

  1. Time capture: now = time.Now().UTC() is set during the first page of S3 listing
  2. Aggressive filtering: Any object with LastModified > now gets skipped with continue
  3. Missing objects: Recently modified objects/prefixes get filtered out

Root Cause

The intent was to avoid race conditions with objects created during listing, but the implementation is too aggressive. It skips legitimate existing objects that happen to have recent timestamps, causing the sync command to miss checking certain prefixes.

The bug is in storage/s3.go in all three listing functions: listObjectsV2(), listObjects(), and listObjectVersions().

arminnajafi avatar Jul 25 '25 01:07 arminnajafi