aws-cli icon indicating copy to clipboard operation
aws-cli copied to clipboard

s3 sync does not delete excluded files

Open stevenkaras opened this issue 5 years ago • 7 comments
trafficstars

the --delete flag on the s3 sync command does exactly what it says in the manual:

Files that exist in the destination but not in the source are deleted during sync.

Meaning that it will not delete files in the destination if they exist in the source even if they were excluded. To reproduce:

aws s3 mb s3://sync-delete-exclude
mkdir /tmp/sync-delete-exclude
touch /tmp/sync-delete-exclude/{1,2,3}
aws s3 sync /tmp/sync-delete-exclude/ s3://sync-delete-exclude
# we expect this to delete the file 3
aws s3 sync s3://sync-delete-exclude --exclude=3 --delete

Our use case is to sync only the last week/month/year of files out of a s3 bucket using exclude and include filters, but files that are excluded are not deleted, meaning we must invoke another step afterwards to delete excluded files. This seems far off from the intent of the command which is "make the destination look like the source after filtering"

stevenkaras avatar Feb 05 '20 15:02 stevenkaras

I think we should treat this as a feature request but make some adjustments to avoid breaking changes. In my opinion, the current documentation is valid, if somewhat confusing. It specifies that the exclude operation operates at the command level, so one could reasonably argue that flags such as delete shouldn't apply to excluded files.

--exclude (string) Exclude all files or objects from the command that matches the specified pattern.

I think a good alternative might be to add a --delete-excluded flag that meets your use case. Thoughts? Without an additional option, I don't think we can make the change without breaking others. e.g.

rm /tmp/sync-delete-exclude/2
# we expect this to delete the file 2 and now 3
aws s3 sync /tmp/sync-delete-exclude/ s3://sync-delete-exclude --exclude=3 --delete --delete-excluded

klaytaybai avatar Feb 06 '20 00:02 klaytaybai

Given the close parallel to rsync, --delete-excluded is the best option. I would suggest borrowing the copy from their documentation, as it makes the interaction of the --delete and --exclude flags explicit which is currently only implied.

stevenkaras avatar Feb 09 '20 08:02 stevenkaras

We have a slightly similar problem. Our command looks like this: aws s3 sync s3://<bucket> /<folder> --delete --exclude "*" --include "*.py" This is running in Kubernetes in a sidecar container where the main container adds *.pyc files in the same folder the are then deleted by the sync command. We'd want to a --delete-only-included flag to only delete files in the target folder that are specified in the include of the sync command. (Hope that makes sense)

zappallot avatar Oct 12 '20 12:10 zappallot

hi, what's up with this request ? I also need the feature, as I want that only a specific folder is used with "--delete" so other folders than the given aws s3 sync /<folder1> s3://<bucket>/<folder1> --delete are not involved

s3://<bucket>/<folder1> synced with --delete flag s3://<bucket>/<folder2> not touched s3://<bucket>/<folder3> not touched

ticteam avatar Dec 13 '21 04:12 ticteam

I agree with the need for a new explicit option. We rely on the not-deleting version of --exclude for distributed processes that aggregate data back to the same directory not to obliterate each other's output.

atz avatar Dec 20 '21 17:12 atz

Bumping the issue, the impact of --exclude on --delete is very confusing and in various cases - not expected

jaklan avatar Nov 08 '24 00:11 jaklan

I agree with this, Can't believe how little attention it has had after 5 years

martinrw avatar Jun 12 '25 13:06 martinrw