rclone icon indicating copy to clipboard operation
rclone copied to clipboard

vfs: add ability to exclude files from being uploaded (eg for temporary files)

Open zenjabba opened this issue 6 years ago • 41 comments

Issue 1: rclone uploads ~partial files (which are not ready yet) and I want rclone mount to "ignore" these files and NOT uploading them to the cloud Issue 2: I want rclone to ignore whole directories to NOT upload to it's mount. ie, .grab directories that Plex DVR creates.

I've tried --exclude and couldn't get it to work as expect (or at all) rclone mount does not respect exclude parameters

Thanks!

zenjabba avatar Feb 28 '18 17:02 zenjabba

Are you using the cache with the mount?

I don't have this issue as I have a ten minute delay timer on the cache upload, on top of that, you write the partials locally to a cache and it won't upload or even queue the upload until the file is unlocked (read not being written to anymore). When it's usually unlocked it's moved/renamed anyway prior to upload.

danielloader avatar Mar 04 '18 23:03 danielloader

Still having the problems, basically Plex writes to the directory .grab until it’s finished with the DVR, then moves it to the correct place. I simple don’t want rclone to upload if anything is in .grab

On 4 Mar 2018, at 6:06 pm, Daniel Loader [email protected] wrote:

zenjabba avatar Mar 04 '18 23:03 zenjabba

@daniel-loader do you have the delay for vfs-cache? What's the trigger for this?

ErAzOr2k avatar Mar 05 '18 06:03 ErAzOr2k

This isn't a vfs cache, but the actual cache remote that wraps the Google drive remote.

danielloader avatar Mar 05 '18 07:03 danielloader

--exclude works for me

$ rclone mount /tmp/big /tmp/mnt/
$ ls /tmp/mnt/
100M  1G    directory  new-file   new-file3  new-file5    potato3.txt
120M  200M  hello.txt  new-file2  new-file4  potato2.txt  potato.txt
$ rclone mount --exclude '*new*' /tmp/big /tmp/mnt/
$ ls /tmp/mnt/
100M  1G    directory  potato2.txt  potato.txt
120M  200M  hello.txt  potato3.txt

What are you trying?

ncw avatar Mar 05 '18 21:03 ncw

I think I may know what the OP is trying and I might be having the same issue.

I have a gdrive->cache->crypt remote mounted using rclone mount like this:

rclone mount gdrive-cache-crypt: /home/user/GDrive --exclude="*.tmp" --allow-other --cache-dir="/home/user/.cache/rclone/vfs-cache" --vfs-cache-poll-interval="1h" --vfs-cache-mode="writes" --vfs-cache-max-age="72h"

find /home/user/GDrive -name "*.tmp" finds nothing. So, --exclude is working for the rclone mount command. But .tmp files are still being uploaded to the remote.

Same result using backend cache only, like this: rclone mount gdrive-cache-crypt: /home/user/Gdrive --allow-other --exclude=*.tmp --cache-tmp-upload-path=/home/user/.cache/rclone/tmp-cache --cache-tmp-wait-time=5m

How do I tell rclone that those files must be ignored and not uploaded?

Thanks!

sbono avatar Apr 03 '18 17:04 sbono

@sbono - thanks for explaining - I see the problem now.

--exclude works on the listings that mount uses to see the remote files.

However nothing stops you uploading an filename that is excluded.

This probably requires fixing in two places

  • the vfs layer so excluded file names are never uploaded, they are just kept in the local cache
  • the cache backend, so it does the same thing.

Or maybe it could be fixed just in the VFS layer... This would require rclone ignoring direct uploads of files which had the wrong names

ncw avatar Apr 05 '18 16:04 ncw

How's this issue looking? Any sort of workaround to prevent the vfs rclone mount from automatically uploading partial~ files? Thanks! :)

zjpleau avatar Dec 28 '18 18:12 zjpleau

This wouldn't be too tricky to implement - does anyone fancy having a go with help from me?

ncw avatar Dec 29 '18 16:12 ncw

Hello,

Has there been any progress with this or is there any other alternative to have rclone mount keep certain files on the local disk only?

xyphosbz avatar Jul 18 '19 18:07 xyphosbz

I think the way to implement this would be to have a flag (or perhaps a multiple of flags) maybe --vfs-upload-exclude glob where glob is as in the filtering rules, eg *~

If a file matched this then the VFS would not upload it ever though it would download them and keep them in the VFS cache.

How does that sound?

ncw avatar Jul 30 '19 18:07 ncw

I'd think the best route to keep it simple is the follow the other filtering flags or even a simple flag to apply the same filtering to vfs upload?

Of course my method of "keep it simple" may apply more to the user than the code so maybe your method is best. But if not too bad, maybe include the existing filter flag to apply to vfs upload also? Maybe a -vfs-upload-filter that just enables that same filter and set default to Y?

Dulanic avatar Aug 24 '19 15:08 Dulanic

I'd think the best route to keep it simple is the follow the other filtering flags or even a simple flag to apply the same filtering to vfs upload?

So that would mean using the existing filter commands.

Let's say you were using --exclude *.tmp. This means that if there are any .tmp files on the remote they will not appear in the mount (this works now). With this extension, we would filter uploads to the mount and not upload anything with that extension - that seems straight forward too.

However the potential problem is what happens when a directory goes out of the directory cache. Let's say you made a file.tmp - this is visible in the mount only because it was created locally. When the directory cache disappears then this file will disappear, and since it isn't on the remote cloud storage it won't re-appear when the directory is re-read.

Maybe this is acceptable? (The --vfs-upload-filter flag has this problem too.) Or maybe this needs a bit more logic to unify what is in the cache with what is read from the remote when the directory is re-read after being dropped from the cache.

ncw avatar Sep 03 '19 13:09 ncw

Another thought, what would happen if a file that is filtered from being uploaded is renamed to something that is not filtered? Would that be uploaded after the rename?

And what would happen in the reverse scenario? Something that is not filtered is renamed to something that is filtered?

darthShadow avatar Sep 03 '19 15:09 darthShadow

Another thought, what would happen if a file that is filtered from being uploaded is renamed to something that is not filtered? Would that be uploaded after the rename?

Hmm, that is another corner case. At the moment the .tmp file gets uploaded, then renamed on the cloud storage. This would need another code path.

And what would happen in the reverse scenario? Something that is not filtered is renamed to something that is filtered?

...then there would be a file on the remote that we need to delete.

There are a lot of corner cases here :-(

ncw avatar Sep 03 '19 16:09 ncw

The specific issue with partials being excluded until they are renamed can probably be solved by introducing a delay before the upload as discussed in #3186.

It won't solve the actual issue but it should satisfy everyone who has shown interest regarding that.

darthShadow avatar Sep 03 '19 17:09 darthShadow

Any update here? Would be a really handy feature to be able to ignore files to upload

nicam avatar May 21 '20 09:05 nicam

The specific issue with partials being excluded until they are renamed can probably be solved by introducing a delay before the upload as discussed in #3186.

This will be in the VFS revision which will go into 1.53 hopefully

ncw avatar May 25 '20 14:05 ncw

Is there a way to exclude folders with the new VFS revision, @ncw ?

agneevX avatar Oct 12 '20 17:10 agneevX

The specific issue with partials being excluded until they are renamed can probably be solved by introducing a delay before the upload as discussed in #3186.

This will be in the VFS revision which will go into 1.53 hopefully

This did go into 1.53 as the --vfs-writeback-delay parameter

Is there a way to exclude folders with the new VFS revision

You can use the filter commands on a mount if you want to exclude a directory, however this may not do what you want (see above).

ncw avatar Oct 13 '20 14:10 ncw

The specific issue with partials being excluded until they are renamed can probably be solved by introducing a delay before the upload as discussed in #3186.

This will be in the VFS revision which will go into 1.53 hopefully

This did go into 1.53 as the --vfs-writeback-delay parameter

Is there a way to exclude folders with the new VFS revision

You can use the filter commands on a mount if you want to exclude a directory, however this may not do what you want (see above).

Yes but say we didn't want a file to LEAVE cache based on a regex (say it's being worked on by a piece of software).

I have the cache set on an SSD and I constantly see log messages about the file queuing for upload in XX minutes everytime I modify it, and I'd like to just ignore any files based on a regex so I don't constantly get those verbose messages.

ellisonpatterson avatar Oct 28 '20 19:10 ellisonpatterson

Another thought, what would happen if a file that is filtered from being uploaded is renamed to something that is not filtered? Would that be uploaded after the rename?

Hmm, that is another corner case. At the moment the .tmp file gets uploaded, then renamed on the cloud storage. This would need another code path.

And what would happen in the reverse scenario? Something that is not filtered is renamed to something that is filtered?

...then there would be a file on the remote that we need to delete.

There are a lot of corner cases here :-(

How about offloading the logic into a separate backend? They do not have to be a backend, but the concepts might help reduce undefined behaviors. For different use cases:

  • A permission backend with a transparency option defining the behavior when writing to a file without permission. The options could be error, persist, and blackhole. The persist option means keeping the file in the vfs cache, while the blackhole means dumping the data silently.(There might not be much use cases for blackhole but added for completeness.)
    • This is tricky for the vfs lifecycle. Should rclone dump the cache for persist after a restart?
    • Possibly separate the read filtering and the write filtering to reuse the filtering syntax.
  • A diverge backend that is similar to the union, but allow complex policies based on filtering. We can manually pick a cache location(even :memory: :smirk:) and manage the lifecycle. It's probably sufficient to support merging two remotes only.
    • I don't know, but does vfs mount support direct write to local? If not, the files might be moving around all the time.
  • A pin backend that keeps some of the files available offline, and some of them never uploaded to the server. Some programs works as long as the files do not leave the cache, and does not mind updating the files to the remote from time to time, e.g. backup programs.
    • This effectively implements a selective sync client.

It may seem that these backends are bringing back the cache remote - IMHO it's not. Cache is elementary for all writable mounts, but these backends provide some extra functionalities.

sshockwave avatar Jan 10 '21 05:01 sshockwave

This feature would be really useful. Some kind of exclude patterns for folders and files. I already searched for many workarounds but none of them are working. E.g. ich tested mergerfs in front of host directory and rclone mount to split out working directories (to prevent uploading folders to a rclone remote). But this won't work since most software uses "rename" feature of filesystem and this triggers an EXDEV error since you can't rename files over two different devices/filesystems. So for example if you have a software that stores files in /data/files and has a /data/tmp directory where it puts current running uploads and renames them when finished to /data/files/uploadedFile.ending this won't work.

I would be really glad if rclone could get an exclude-from-upload filter feature and also a feature to let VFS cache some directories or files that match a pattern infinite.

So it would be possible to say: Don't upload /data/tmp and cache /data/tmp infinite on my harddisk (to prevent deletion while cache cleanup).

PatrickHuetter avatar May 03 '21 21:05 PatrickHuetter

For us, the feature would also be very important to exclude files that change too often or are only temporary.

dennisoderwald avatar May 07 '21 10:05 dennisoderwald

Would anyone like to work on this feature - happy to talk it through?

Or alternatively maybe one of your companies would like to sponsor me to implement it?

ncw avatar May 14 '21 14:05 ncw

See a lot of errors from Dropbox temp files like:

2021/07/28 00:07:25 ERROR : .~lock.features.odt#: Failed to copy: upload failed: batch upload failed: path/disallowed_name

I would really appreciate this feature

Yanpas avatar Jul 27 '21 21:07 Yanpas

any news?

fidodone avatar Nov 09 '21 18:11 fidodone

any news?

forrestsocool avatar Mar 31 '22 13:03 forrestsocool

Got a plan yet? I'm currently recording in Typora and each time I save it, a temporary file is created and the temporary file is uploaded to the webdav server before being deleted, which doesn't seem very efficient, so it would be great if a filter could be added.

image

xieliuhao avatar Jul 09 '22 01:07 xieliuhao

+1

I'm just coming to this, but I would find it useful to be able to exclude temporary files by a pattern to avoid them being uploaded and wasting ops/bandwidth.

dbotwinick avatar Sep 10 '22 19:09 dbotwinick