aws-cli
aws-cli copied to clipboard
s3 sync does not delete excluded files
the --delete flag on the s3 sync command does exactly what it says in the manual:
Files that exist in the destination but not in the source are deleted during sync.
Meaning that it will not delete files in the destination if they exist in the source even if they were excluded. To reproduce:
aws s3 mb s3://sync-delete-exclude
mkdir /tmp/sync-delete-exclude
touch /tmp/sync-delete-exclude/{1,2,3}
aws s3 sync /tmp/sync-delete-exclude/ s3://sync-delete-exclude
# we expect this to delete the file 3
aws s3 sync s3://sync-delete-exclude --exclude=3 --delete
Our use case is to sync only the last week/month/year of files out of a s3 bucket using exclude and include filters, but files that are excluded are not deleted, meaning we must invoke another step afterwards to delete excluded files. This seems far off from the intent of the command which is "make the destination look like the source after filtering"
I think we should treat this as a feature request but make some adjustments to avoid breaking changes. In my opinion, the current documentation is valid, if somewhat confusing. It specifies that the exclude operation operates at the command level, so one could reasonably argue that flags such as delete shouldn't apply to excluded files.
--exclude (string) Exclude all files or objects from the command that matches the specified pattern.
I think a good alternative might be to add a --delete-excluded flag that meets your use case. Thoughts? Without an additional option, I don't think we can make the change without breaking others. e.g.
rm /tmp/sync-delete-exclude/2
# we expect this to delete the file 2 and now 3
aws s3 sync /tmp/sync-delete-exclude/ s3://sync-delete-exclude --exclude=3 --delete --delete-excluded
Given the close parallel to rsync, --delete-excluded is the best option. I would suggest borrowing the copy from their documentation, as it makes the interaction of the --delete and --exclude flags explicit which is currently only implied.
We have a slightly similar problem. Our command looks like this:
aws s3 sync s3://<bucket> /<folder> --delete --exclude "*" --include "*.py"
This is running in Kubernetes in a sidecar container where the main container adds *.pyc files in the same folder the are then deleted by the sync command.
We'd want to a --delete-only-included flag to only delete files in the target folder that are specified in the include of the sync command. (Hope that makes sense)
hi,
what's up with this request ?
I also need the feature, as I want that only a specific folder is used with "--delete" so other folders than the given aws s3 sync /<folder1> s3://<bucket>/<folder1> --delete are not involved
s3://<bucket>/<folder1> synced with --delete flag
s3://<bucket>/<folder2> not touched
s3://<bucket>/<folder3> not touched
I agree with the need for a new explicit option. We rely on the not-deleting version of --exclude for distributed processes that aggregate data back to the same directory not to obliterate each other's output.
Bumping the issue, the impact of --exclude on --delete is very confusing and in various cases - not expected
I agree with this, Can't believe how little attention it has had after 5 years