Implement 'rewrite' command to exclude files from existing snapshots
What is the purpose of this change? What does it change?
This adds 'rewrite' command that provides a way to remove data from existing snapshots.
This is some sort of preliminary pull requests. It's actually works here on test repo, but lacks tests, etc. Mostly to discuss things
Was the change discussed in an issue or in the forum before?
[ replaces #2720 because it was created from master branch ] closes #14
Checklist
- [x] I have read the Contribution Guidelines
- [x] I have enabled maintainer edits for this PR
- [ ] I have added tests for all changes in this PR
- [ ] I have added documentation for the changes (in the manual)
- [ ] There's a new file in
changelog/unreleased/that describes the changes for our users (template here) - [x] I have run
gofmton the code in all commits - [ ] All commit messages are formatted in the same style as the other commits in the repo
- [ ] I'm done, this Pull Request is ready for review
Thanks a lot for review. I'll try to address everything soon
I Just mirrored my minio and used this branch to rewrite a few different scenarios. looks good so far. restoring files worked as expected and matched versions restored from original repo. :) The repo is several GB smaller though.
I would like to help move this feature forward.
Hi, I was busy with personal things. Will try to catch up this weekend and at least update/rebase it.
Is supporting such a feature for a whole repository planned?
Is supporting such a feature for a whole repository planned?
This PR already allows to give multiple snapshots or even multiple snapshot criteria. E.g. you can run it with all snapshots or give multiply -H options to cover all hosts.
We could add a possibility to specify all which resolves to all snapshots - but I would leave that to a future PR as it is pretty independent from this change.
Good stuff, thank you. 👍
What's the status of this PR? Any chance that it gets merge soon? I'd very much like to use the feature :-D
Good work 👍🏼
@bertbesser This PR should be considered work in progress. No decision has been made about whether this should be merged or not, the PR is not yet marked as done by the author, and there's a bunch of unaddressed comments. Clearly it's not ready for merging, and the status is simply what you see here in the GitHub GUI where you posted a comment. That said, there's a lot of other stuff going on if you look in this repository, so consider this WIP. Feel free to try it yourself though, just don't consider it production ready (at least not officially).
Hi,
@bertbesser
It was working pretty well on my machine in time it was created (but yes, it's very experimental). Good news is that it doesn't touch any data pack files at all. Basically It just adds new snapshot as copy of existing with a few files removed (plus optionally removes old snapshot). I don't think that it can corrupt repository (except new snapshots, that can be just deleted manually)
I think that 'rewrite + check + prune' scenario should be safe enough. I've already used it on my own 2.5TB repository without any problems. Removed most of garbage from old snapshots.
But I'm not sure that it'll work as is right now, I'll rebase it on top of latest git and try to address most of comments next week.
This looks like a fantastic feature I sorely miss in Restic. Is there anything I, as a user, can do to help with this PR ?
Absolutely! I’ve also used it to clean up 30gb of junk from early backups. I didn’t know how to filter back then. My repo was 40 gb and now rest consistently under 10 with about 2years of backups. Really helped!!! But that was also some time ago.
On Thu, Oct 21, 2021 at 3:26 PM Gaibhne @.***> wrote:
This looks like a fantastic feature I sorely miss in Restic. Is there anything I, as a user, can do to help with this PR ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/restic/restic/pull/2731#issuecomment-948933055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACPBAZEXMQKIZCY5DGQ4YRTUIBSNNANCNFSM4M6YEN6A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
It's good news that it seems to work fine for those who tried it, and the author's description of how it works suggests it might be not very intrusive and have limited ability to cause any harm if not working as intended. But it's still WIP, needs rebasing, need addressing comments and other polishing, not to mention tests and documentation. Let's conclude that this is the case and unless someone does those parts it's not even close to further review, sorry.
Hi @dionorgua, is there any chance you'd have an update regarding your work on this PR? I may or may have not oopsied a B2 backup again. 🤠
I've rebased the PR and made quite a few cleanups (see the commit messages for more details). The main tasks left are:
- [x] Writing some documentation
- [x] More tests. The current test coverage is just the absolute minimum
- [x] Decide what to do with the
--inplaceflag, I'll probably rename it to--forget - [x] Make filter option handling and descriptions work as in #3912
- ~~[ ] Performance optimizations.~~ -> Can be done later on.
- [x] Finalize the CLI options
I'm so thrilled that this PR is getting some progress ! Many thanks to the team members for all the latest commits. I will do some test on my mostly "local" repos.
I support @MichaelEischer on the naming for --forget instead of --inplace. Of course, "are you sure" could follow before execution, just like prune. As discussed with @dionorgua before, a clear --tag-old-snapshots-with <tag> and --tag-new-snapshots-with <tag> would work great for managing 2 step forget process.
Keep on the great work !
It was working pretty well on my machine in time it was created (but yes, it's very experimental). Good news is that it doesn't touch any data pack files at all. Basically It just adds new snapshot as copy of existing with a few files removed (plus optionally removes old snapshot). I don't think that it can corrupt repository (except new snapshots, that can be just deleted manually)
I think that 'rewrite + check + prune' scenario should be safe enough. I've already used it on my own 2.5TB repository without any problems. Removed most of garbage from old snapshots.
@NovacomExperts The --tag-xxx-snapshots-with would just be a welcome addition to this, correct? Not a fundamentally other approch if I get you right. Cheers
I've added enough tests to achieve over 80% test coverage for the new code. There's now also a new documentation section. I'll probably split some of the preparatory refactoring commits into separate PRs to reduce the amount of code a bit.
As discussed with @dionorgua before, a clear
--tag-old-snapshots-with <tag>and
--tag-old-snapshots is impossible to implement. We cannot change the tag of an existing snapshot without creating a new one and removing the old snapshot.
Instead of using tags, using the printed snapshot id should also work.
I've added a commit to let the rewrite command reject rewriting anything which would result in loosing data in a tree. This ensures that we can add new fields to the Tree struct without having to increase the repository version every time. Without the check, the rewrite command could just loose data.
The check is not completely optimal, but it should prevent rewriting for all problematic situations. If the need arises to handle this in a more targeted way, we can still adapt the implementation later on.
@MichaelEischer This is such a great addition to Restic. Again, thank you a thousand times for making this possible. At a point I was almost ready to learn GO and give it a shot, but seeing you commits, and the level of expertise and care needed... Man this is waaaay out my league :)
--tag-old-snapshots was merely a brainstorming idea. I think that the --forget flag will do the job perfectly in one time.
I'll have some free time to in the next few days to recompile with the latest commits. I'll test a few of my "large" repos.
Hello !
I've been playing around with 72196e5 and I had to use in production on a version 1 repo. It was a 3 host local repo composed of 830 GB and was correctly shrinked to 319 GB. Everything went very well and all checks are ok (including read-data). Restore tests were OK too. I have kept a backup of the original repo just in case but so far so good ! I wanted to exclude a specific folder from multiple snapshots except the last 30 days on a specific host. Here’s what I ended up doing :
restic forget --dry-run --host myhost --keep-last 30
Copied the list of snapshots listed a the end without the brackets
restic rewrite --forget --exclude /home/test/foldertoexclude <pastesnapshots>
Take a coffee, or two
restic prune
enjoy !
LGTM. I've made one final change before merging this PR: when just passing--iexclude-file the rewrite command would complain about missing excludes.
Everything went very well and all checks are ok (including read-data).
Thanks for testing! Good to hear that the PR indeed worked as expected (it's been in a good shape for some time.
Any idea when this will be in the prebuilt binaries?
It will be in the next release, which is released when it's done.
@12nick12 You might want to have a look at https://beta.restic.net/
FWIW, we're very close to a new release so stay tuned and you'll have it in your hands soon :) But indeed, nothing wrong with using the builds @aawsome pointed to (thanks for pointing that out)! I use them all the time.
@12nick12 It's out now! See: https://restic.net/blog/2023-01-12/restic-0.15.0-released/