aws-cli icon indicating copy to clipboard operation
aws-cli copied to clipboard

AWS S3 sync operations do not work with S3 directory buckets (S3 Express One Zone)

Open jeffgardnerdev opened this issue 1 year ago • 4 comments

Describe the bug

If you run aws s3 sync with the source from a standard bucket and the destination from a directory bucket, the comparison does not work properly. Some files that do exist in the source are not recognized and some files that do exist in the destination are also not recognized.

Expected Behavior

Only the source objects that changed since the last sync are copied to the destination.

Current Behavior

Some objects that did not change since the last sync are copied to the destination. Those objects are also deleted from the destination before copying from source if the --delete flag is supplied, indicating that it actually does see the object at the destination, but doesn't consider it the same object as what is in the source.

Reproduction Steps

Test scenario: a source prefix in a standard bucket is successfully synced to the same prefix in a directory bucket. The prefix has 5 objects. A subsequent sync performed immediately after results in copying one of those objects again, even though nothing has changed. The debug messages contain a message that "file does not exist at destination" even though it does exist at the destination. Interestingly, if the --delete flag is supplied, that file is deleted from the destination and then recopied.

Possible Solution

I believe this bug stems from an inconsistency in the response order in the list-objects-v2 API. For whatever reason the file that was considered missing in the test scenario is listed last in the response for the directory bucket, while it is listed first in the response for the standard bucket.

Additional Information/Context

No response

CLI version used

2.14.4

Environment details (OS name and version, etc.)

macOS Ventura 13.4.1

jeffgardnerdev avatar Jan 08 '24 17:01 jeffgardnerdev

Hi @jeffgardnerdev, thanks for reaching out. Could you provide debug logs of this behavior? You can get debug logs by adding --debug to your command, and redacting any sensitive information. Logs with --delete and without --delete would both be appreciated. Thanks!

RyanFitzSimmonsAK avatar Jan 09 '24 23:01 RyanFitzSimmonsAK

Here are the two debug outputs, one without the --delete option and one with the --delete option. In this case the two prefixes are identical but the sync command copies/deletes 4 of the 5 files every time. Bucket names are redacted. ListObjectsV2 responses for the two prefixes return the objects in a different order. s3-sync-standard-to-directory-with-delete-debug-output.txt s3-sync-standard-to-directory-debug-output.txt

jeffgardnerdev avatar Jan 10 '24 15:01 jeffgardnerdev

@jeffgardnerdev, thanks for your patience. I was able to reproduce this behavior, and your theory that this is related to ListObjectsV2 not sorting directory buckets is likely correct. We've reached out to the service team, and I'll leave any updates in this issue.

If other people are experiencing this issue, providing details and any impact in this issue would be appreciated.

Ticket # for internal use : P114641353

RyanFitzSimmonsAK avatar Jan 22 '24 19:01 RyanFitzSimmonsAK

@RyanFitzSimmonsAK Could you share some more information about the decision to disable the sync command for directory buckets entirely? This is useful functionality that would be nice to keep, provided the ordering issue could be fixed.

jeffgardnerdev avatar Feb 14 '24 21:02 jeffgardnerdev

Hello,

Thank you for reporting this issue. We have released AWS CLI v1.32.25 and v2.15.13 and strongly recommend that you upgrade to address this issue. If you are unable to upgrade, do not run the aws s3 sync command with an S3 Express One Zone directory bucket.

kellertk avatar Feb 28 '24 00:02 kellertk

@jeffgardnerdev I did see your question about further information, which I will post here if I am able.

kellertk avatar Feb 28 '24 00:02 kellertk

Here are some additional technical details on this issue if you are interested.

The reason we’ve removed the sync command for directory buckets is what @jeffgardnerdev already noticed: there is an incompatibility between the way that the CLI compares a list of objects in a bucket and the results of the ListObjectsv2 API call on directory buckets[1]. Operations with S3 are threaded, and we’re comparing a list of objects as from the S3 API as that list is being populated. This isn’t compatible with directory buckets, because there’s no way to ensure the ordering of objects coming from S3. Because of the incompatibility, the sync command will not work properly on a directory bucket. There is no workaround for directory buckets and sync at this time, except to refrain from using sync and instead use cp or similar.

In the versions of the CLI I noted above, v1.32.25 and v2.15.13 and later, we removed sync with directory bucket destinations or sources to prevent anyone from using this command and getting inconsistent or incorrect results.

1: Specifically, "Sorting order of returned objects" on the linked documentation page.

kellertk avatar Feb 29 '24 01:02 kellertk

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Feb 29 '24 01:02 github-actions[bot]

Isn't it a bit weird to close this issue, @kellertk? I would assume that people would want sync to work with S3 Express One Zone, for natural API compatibility.

Froskekongen avatar Jun 02 '24 12:06 Froskekongen