snapbtrex
snapbtrex copied to clipboard
Attempt to better handle incomplete transfers
I regularly have the issue that an interrupted transfer leaves a broken snapshot on the remote, which prevents all future transfers and also breaks incremental backups when the last "complete" snapshot gets cleaned up on the remote.
This attempts to help, by trying to detect the incomplete snapshots, and ignoring/deleting/working-around them, as well as preventing clean-up on a repository with a broken snapshot.
As the commit message says, it's not perfect, but I think it does help. I'm more than happy to hear better ways to handle it!
P.S. I'll submit a new PR with a README change describing how it works/doesn't work if the approach looks reasonable.
I did some digging in my data (e.g. snapshots of 5 years) and found one encounter where this happened! I seem to be lucky with network stability.
This was no real problem - it recovered - but you're right, there where actually 3 snapshots unusable lying around.
You can actually trace the following snapshot through the Snapshot(s): property if one snapshot possesses more than one snapshot there is something odd, also if one snapshot doesn't have another one (and its not the last) then there's also an issue.
That's interesting - I'm not sure how yours is able to recover.
For me it goes like this:
- send/receive is interrupted, leaving a snapshot on the remote side which has no received_uuid
- The next time snapbtrex tries to sync, it sees the partial snapshot, and determines that it's the best parent (it's the most recent), and so tries to do a new send/receive against that snapshot
- The receiver side fails, because the parent doesn't have a received_uuid
- snapbtrex aborts (though since the new error handling it doesn't exit with an error - that caught me out 😆 )
- Next time snapbtrex tries - the same thing happens. The "broken" snapshot is still the most recent, gets picked as the parent, and fails.
This leads to a further problem - I run a periodic job which prunes snapshots from my "remote" repository, using snapbtrex running locally there. This treats the "broken" snapshot as the most valuable - because it's the newest - and normally deletes the second-most-newest (which is complete).
On the sender side, I only keep the snapshot which was most-recently successfully sent to the remote, for use for the next incremental update. The problem is, the "prune" on the receiver side doesn't pay attention to "successful" vs "non-successful" - and often it leaves behind the broken snapshot, while deleting the most-recent successful one; and so that then breaks incremental backups entirely because I don't have a suitable parent on the sender side which I can use - and I have to start again with a new full transfer.
P.S. I'm not sure it's about network stability so much as e.g. closing my laptop mid transfer and then taking it somewhere else, or shutting down my PC.