aws-cli
aws-cli copied to clipboard
s3 sync: Read include/exclude patterns from file
Problem
When using s3 sync
, we can include/exclude objects for the sync using --exclude <pattern>
and --include <pattern>
. This can be cumbersome if someone has to deal with many patterns at once.
Proposed solution
I'd suggest to implement two additional options: --exclude-file <filename>
and --include-file <filename>
so we would be able to specify files which contain lists of patterns. See how rsync
does it here. Implementing it in the same way as rsync
does would have the benefit of maintaining one list for both.
Additional selling point: the exclude file can be managed in git without tampering with CI scripts.
Example usage
Now
aws s3 sync . s3://bucket/html/ --exclude=".*" --exclude "license/*" --exclude "LICENSE.md" --exclude "*.map" --exclude "*.scss" --exclude "customVariables.css" --exclude "css/bootstrap/*"
With proposed feature
aws s3 sync . s3://bucket/html/ --exclude-file .syncignore
Alternatives I have considered
None really. The only alternative I know of is using many single statements.
I have the exact same issue at the moment. My solution is just adding hundreds of '--include' automatically to my sync command. It seems this takes much more time than just syncing all files.
'aws --debug' shows some output that indicates that aws cli iterates over all files of my local s3 target checking whether each file matches my include. Of course this happens because aws cli thinks I passed a wildcard and so it's trying to find matches. But if you just pass a path without wildcards this is not needed. I will create another feature request for an argument which just takes paths.
When you already know the files to sync, you could simply call aws s3 sync
for each file. Not sure about under-the-hood-efficiency tho.
@schw4rzlicht unfortunately appending the file path to the bucket name does not work. But I can use 'cp' command for this. But calling the cp command for each file might be not that performant like using a bulk sync.
Edit: I've tried the cp command and it is really slow. Additionally there is no check whether the file was really changed and has to be downloaded, it just downloads the file. This makes the script even slower.
Hey Amazon,
I would be happy to build the feature myself, I would just like to discuss it with you first :)
@schw4rzlicht it seems amazon is not interested :(
Since there are 168 PRs open, I think it's wasted time to contribute here.
By the way, you can find more details about my issue for sync speed up here: https://github.com/aws/aws-cli/issues/5167
Hi @schw4rzlicht, thanks for requesting, and I agree that this could be a useful feature. It was previously raised here: #3520
I'm sorry that I can't provide an ETA on when this could be addressed. Let us know more about your use cases on the other issue.
@kdaily I think this issue is a little bit different and should be higher priorized than your linked one, because @schw4rzlicht thought about a file for includes and a file for excludes, not only excludes. Anyway, would be great to have such a feature! :)
Hi. The request mentions rsync
as an example. It's also supported by s3cmd with --include-from
and --exclude-from
. However I'd rather use the official AWS CLI tool.
@rprieto but s3cmd does not support other command aws cli supports. I also think aws cli is better maintained than s3cmd, so it would be better to include the --include-from and --exclude-from in aws cli. I think that's exactly what we want.
Fully agree, I wanted to share an another example of a tool that supports this.
When you already know the files to sync, you could simply call
aws s3 sync
for each file. Not sure about under-the-hood-efficiency tho.
This isn't very performant when you have a very large number of files (which is when this feature would be most useful).
Can we revisit the priority estimation, progress, or an alternative?