grunt-s3 icon indicating copy to clipboard operation
grunt-s3 copied to clipboard

Sync does not remove deleted files from s3

Open coen-hyde opened this issue 12 years ago • 11 comments

Sync will upload new and changed files but will not delete files that had previously been uploaded to s3 and since removed.

coen-hyde avatar Aug 18 '13 10:08 coen-hyde

Do we really need/want something that will automatically delete files? I for one know that I would stop using that code entirely if that were in place. Web files are cheap, not having them there for consumption is expensive. Are there good use cases for it? Can we make sure it has an off switch when using sync?

geedew avatar Aug 18 '13 12:08 geedew

It's certainly a bit of a dangerous feature and it should be off by default. I am maintaining a large number of assets on s3. Changes are mostly additions but sometimes they are also deletes. Having a unified way to maintain the assets would be good.

coen-hyde avatar Aug 18 '13 19:08 coen-hyde

I'll take a stab at it. My interest is piqued. I'm wondering if it would be better to do a whitelist with rules, rather than a delete:on option.

For instance: del: { files: [ '/only/these/*/files/.js` ],// can be deleted between : [ null, new Date(new Date().setDate(new Date().getDate() - 10 )) ] }

So only delete from files, and if they are between date 0 and 10 days ago (don't delete anything in the last 10 days). The delete functionality would obviously be updated to use this, so that sync can take advantage? The hardest part is really just knowing what files are on S3 to actually delete. I don't think it's actually possible with Knox and will require the AWS lib.

http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/frames.html

geedew avatar Aug 18 '13 19:08 geedew

By the way thanks for implementing the initial sync functionality. For my use case the above filters would be complicating the solution, though they may be useful to someone else. I would be interested to hear the opinions of other people using this project. To me 'sync' implies the sync functionality will make whatever changes that are necessary to files stored on s3 to reflect the current state of the local files (PUT'ing and DELET'ing).

We could use https://github.com/segmentio/s3-lister. It uses a knox client to implement a streaming interface to listing a bucket. Though it probably makes sense to build this on top of AWS's node sdk.

coen-hyde avatar Aug 18 '13 21:08 coen-hyde

It is actually possible to get a list of files in a bucket with knox, so it should be very possible to add this in.

client.list({ prefix: 'my-prefix' }, function(err, data){

geedew avatar Aug 18 '13 21:08 geedew

The only problem with that, is that it can only list 1000 files at a time, so some paging functionality would have to be implemented.

coen-hyde avatar Aug 18 '13 21:08 coen-hyde

One thing at a time. Getting the first 1000 to work first would be a great step forward :)

geedew avatar Aug 18 '13 21:08 geedew

yes it would :)

coen-hyde avatar Aug 18 '13 21:08 coen-hyde

+1. I think this feature should be added to make able ability to sync the whole folder: upload files that not in bucket yet and delete objects that don't exist on file system. Also helper s3.list would be helpful for composing custom tasks.

wclr avatar Jan 17 '14 04:01 wclr

+1. It would be great to delete objects that no longer exist on the filesystem.

dgil avatar Apr 10 '14 15:04 dgil

+1 for this as well

andrewboni avatar May 22 '14 08:05 andrewboni