duplicacy icon indicating copy to clipboard operation
duplicacy copied to clipboard

Delete specific file from backups

Open 13dimitar opened this issue 7 years ago • 8 comments

Is it possible to delete a specific file from all backups, and if not - do you plan to implement such feature?

13dimitar avatar Nov 14 '17 14:11 13dimitar

This would be tricky as that specific file is not likely to be stored as independent chunks, but more likely to have it's bits mingled with other files.

It would technically be possible to identify the relevant chunks, download them, excise the targeted bits, re-upload the altered chunks with new hash names, and rewrite all the affected snapshots, but that is going to rapidly expand out to a huge number of changes. That sounds complicated and fragile to me.

Your best bet would be to exclude that file from future backups and potentially destroy all historical snapshots that do have the file.

Is there a use case you're targeting? Or just a one-off change you want to make to your backups?

fracai avatar Nov 14 '17 14:11 fracai

Having that would allow you to delete completely certain information, even from your backup. A classic use case would be if you have clients and they want you to delete their data completely. I think that this feature would be something people look for more and more, especially when considering the changes made in the past few years in Europe - Protection of personal data.

13dimitar avatar Nov 14 '17 14:11 13dimitar

I think that specific use case would be better served by multiple configurations. Deleting user data would simply be deleting that configuration and all of the snapshots.

I would think data protection laws would require segregating user data anyway, rather than allowing it to be mixed in with other users. (Just my guesses)

fracai avatar Nov 14 '17 14:11 fracai

Still, it's a feature some backup solution have, would be nice to see it in duplicacy as well.

13dimitar avatar Nov 14 '17 14:11 13dimitar

In my case, when I started backing up my repository, I didn't have any filters. Everything got backed up. Now I have filters which exclude about 30G of the 90G repository, and I'd love to remove those 30G of useless files from earlier backup revisions.

It's not a matter of deleting sensitive data for me, so I don't mind if a few of the unwanted files stay in the storage because they're in the same chunk as an "included" file. But there are probably lots of chunks composed entirely of unwanted files. Couldn't those be marked as fossils?

@gilbertchen has said that he partly modeled duplicacy on git. In git, you can run git filter-branch or BFG Repo-Cleaner to rewrite history. Maybe this idea is a bit similar.

highfalutin avatar Nov 20 '18 08:11 highfalutin

@highfalutin one thing you can do to easily solve your particular problem is

  1. delete all duplicacy revisions prior to when you added the filters file
  2. when no other backups are running run duplicacy prune -exclusive -exhaustive which will clean everything except the chunks which belong to the new revisions (those with the filters file on)

TheBestPessimist avatar Nov 20 '18 09:11 TheBestPessimist

Thanks @TheBestPessimist. I did as you said, and my storage size decreased significantly. There was a small typo -- the correct command is duplicacy prune -exclusive -exhaustive (x instead of n in the last word).

Of course this is only a solution if you're willing to delete all revisions prior to the filter change! In my case that was OK.

highfalutin avatar Nov 20 '18 21:11 highfalutin

@TheBestPessimist hi - how do I use the cli commands on a docker installation?

geek-baba avatar Feb 13 '23 18:02 geek-baba