s3 sync: Read include/exclude patterns from file

Open schw4rzlicht opened this issue 4 years ago • 13 comments

Problem

When using s3 sync, we can include/exclude objects for the sync using --exclude <pattern> and --include <pattern>. This can be cumbersome if someone has to deal with many patterns at once.

Proposed solution

I'd suggest to implement two additional options: --exclude-file <filename> and --include-file <filename> so we would be able to specify files which contain lists of patterns. See how rsync does it here. Implementing it in the same way as rsync does would have the benefit of maintaining one list for both.

Additional selling point: the exclude file can be managed in git without tampering with CI scripts.

Example usage

Now

aws s3 sync . s3://bucket/html/ --exclude=".*" --exclude "license/*" --exclude "LICENSE.md" --exclude "*.map" --exclude "*.scss" --exclude "customVariables.css" --exclude "css/bootstrap/*"

With proposed feature

aws s3 sync . s3://bucket/html/ --exclude-file .syncignore

Alternatives I have considered

None really. The only alternative I know of is using many single statements.

Apr 27 '20 20:04 schw4rzlicht

I have the exact same issue at the moment. My solution is just adding hundreds of '--include' automatically to my sync command. It seems this takes much more time than just syncing all files.

'aws --debug' shows some output that indicates that aws cli iterates over all files of my local s3 target checking whether each file matches my include. Of course this happens because aws cli thinks I passed a wildcard and so it's trying to find matches. But if you just pass a path without wildcards this is not needed. I will create another feature request for an argument which just takes paths.

Apr 29 '20 12:04 bes1002t

When you already know the files to sync, you could simply call aws s3 sync for each file. Not sure about under-the-hood-efficiency tho.

Apr 30 '20 04:04 schw4rzlicht

@schw4rzlicht unfortunately appending the file path to the bucket name does not work. But I can use 'cp' command for this. But calling the cp command for each file might be not that performant like using a bulk sync.

Edit: I've tried the cp command and it is really slow. Additionally there is no check whether the file was really changed and has to be downloaded, it just downloads the file. This makes the script even slower.

Apr 30 '20 08:04 bes1002t

Hey Amazon,

I would be happy to build the feature myself, I would just like to discuss it with you first :)

Jun 06 '20 15:06 schw4rzlicht

@schw4rzlicht it seems amazon is not interested :(

Since there are 168 PRs open, I think it's wasted time to contribute here.

Jun 09 '20 17:06 bes1002t

By the way, you can find more details about my issue for sync speed up here: https://github.com/aws/aws-cli/issues/5167

Aug 13 '20 17:08 bes1002t

Hi @schw4rzlicht, thanks for requesting, and I agree that this could be a useful feature. It was previously raised here: #3520

I'm sorry that I can't provide an ETA on when this could be addressed. Let us know more about your use cases on the other issue.

Oct 19 '20 21:10 kdaily

@kdaily I think this issue is a little bit different and should be higher priorized than your linked one, because @schw4rzlicht thought about a file for includes and a file for excludes, not only excludes. Anyway, would be great to have such a feature! :)

Oct 19 '20 22:10 bes1002t

Hi. The request mentions rsync as an example. It's also supported by s3cmd with --include-from and --exclude-from. However I'd rather use the official AWS CLI tool.

Nov 09 '20 14:11 rprieto

@rprieto but s3cmd does not support other command aws cli supports. I also think aws cli is better maintained than s3cmd, so it would be better to include the --include-from and --exclude-from in aws cli. I think that's exactly what we want.

Nov 10 '20 09:11 bes1002t

Fully agree, I wanted to share an another example of a tool that supports this.

Nov 10 '20 09:11 rprieto

When you already know the files to sync, you could simply call aws s3 sync for each file. Not sure about under-the-hood-efficiency tho.

This isn't very performant when you have a very large number of files (which is when this feature would be most useful).

Jun 28 '22 18:06 tmccombs

Can we revisit the priority estimation, progress, or an alternative?

May 07 '24 01:05 masonlouchart

aws-cli aws-cli copied to clipboard

s3 sync: Read include/exclude patterns from file

Problem

Proposed solution

Example usage

Now

With proposed feature

Alternatives I have considered

aws-cli
aws-cli copied to clipboard