aws-cli
aws-cli copied to clipboard
AWS S3 sync operations do not work with S3 directory buckets (S3 Express One Zone)
Describe the bug
If you run aws s3 sync
with the source from a standard bucket and the destination from a directory bucket, the comparison does not work properly. Some files that do exist in the source are not recognized and some files that do exist in the destination are also not recognized.
Expected Behavior
Only the source objects that changed since the last sync are copied to the destination.
Current Behavior
Some objects that did not change since the last sync are copied to the destination. Those objects are also deleted from the destination before copying from source if the --delete
flag is supplied, indicating that it actually does see the object at the destination, but doesn't consider it the same object as what is in the source.
Reproduction Steps
Test scenario: a source prefix in a standard bucket is successfully synced to the same prefix in a directory bucket. The prefix has 5 objects. A subsequent sync performed immediately after results in copying one of those objects again, even though nothing has changed. The debug messages contain a message that "file does not exist at destination" even though it does exist at the destination. Interestingly, if the --delete
flag is supplied, that file is deleted from the destination and then recopied.
Possible Solution
I believe this bug stems from an inconsistency in the response order in the list-objects-v2
API. For whatever reason the file that was considered missing in the test scenario is listed last in the response for the directory bucket, while it is listed first in the response for the standard bucket.
Additional Information/Context
No response
CLI version used
2.14.4
Environment details (OS name and version, etc.)
macOS Ventura 13.4.1
Hi @jeffgardnerdev, thanks for reaching out. Could you provide debug logs of this behavior? You can get debug logs by adding --debug
to your command, and redacting any sensitive information. Logs with --delete
and without --delete
would both be appreciated. Thanks!
Here are the two debug outputs, one without the --delete
option and one with the --delete
option. In this case the two prefixes are identical but the sync command copies/deletes 4 of the 5 files every time. Bucket names are redacted. ListObjectsV2 responses for the two prefixes return the objects in a different order.
s3-sync-standard-to-directory-with-delete-debug-output.txt
s3-sync-standard-to-directory-debug-output.txt
@jeffgardnerdev, thanks for your patience. I was able to reproduce this behavior, and your theory that this is related to ListObjectsV2
not sorting directory buckets is likely correct. We've reached out to the service team, and I'll leave any updates in this issue.
If other people are experiencing this issue, providing details and any impact in this issue would be appreciated.
Ticket # for internal use : P114641353
@RyanFitzSimmonsAK Could you share some more information about the decision to disable the sync command for directory buckets entirely? This is useful functionality that would be nice to keep, provided the ordering issue could be fixed.
Hello,
Thank you for reporting this issue. We have released AWS CLI v1.32.25 and v2.15.13 and strongly recommend that you upgrade to address this issue. If you are unable to upgrade, do not run the aws s3 sync
command with an S3 Express One Zone directory bucket.
@jeffgardnerdev I did see your question about further information, which I will post here if I am able.
Here are some additional technical details on this issue if you are interested.
The reason we’ve removed the sync
command for directory buckets is what @jeffgardnerdev already noticed: there is an incompatibility between the way that the CLI compares a list of objects in a bucket and the results of the ListObjectsv2
API call on directory buckets[1]. Operations with S3 are threaded, and we’re comparing a list of objects as from the S3 API as that list is being populated. This isn’t compatible with directory buckets, because there’s no way to ensure the ordering of objects coming from S3. Because of the incompatibility, the sync
command will not work properly on a directory bucket. There is no workaround for directory buckets and sync
at this time, except to refrain from using sync
and instead use cp
or similar.
In the versions of the CLI I noted above, v1.32.25 and v2.15.13 and later, we removed sync
with directory bucket destinations or sources to prevent anyone from using this command and getting inconsistent or incorrect results.
1: Specifically, "Sorting order of returned objects" on the linked documentation page.
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.
Isn't it a bit weird to close this issue, @kellertk? I would assume that people would want sync
to work with S3 Express One Zone, for natural API compatibility.