btrbk extend retention policy to take into account the number of snapshots and/or time of last successful run

If I understand correctly, setting, for example, 24h as part of a retention policy will preserve at most one hourly snapshot as far back as 24 wall-clock hours (not the 24 most recent hourly snapshots). As far as I'm concerned, that's the right thing to do in principle.

Now consider a source turned off over the weekend. Wouldn't btrbk wipe all the hourly snapshots on its first run after the break? So if I did something stupid on Thursday and notice the on Friday, I get fine-grained rollback, if I botch it on Friday and notice on Monday, all I get is a daily, even though subjectively it's the following workday in both cases. (Same for dailies and a longer holiday, etc.) I'm not sure I feel comfortable with that, i.e. btrbk treating time passed with regular successful runs and time passed without any the same. I'd prefer to to keep the most recent 24h of actual work done, for some subvolumes at least, to have the option to "stop time" during downtime, maybe taking the timestamp of the most recent snapshot into consideration, or simply something like 24h+ "keep hourly snapshots 24 hours back, but at least the 24 most recent hourlies" and so on.

Nov 19 '19 11:11 fallenguru

Because btrbk is stateless, it does not know if a snapshot is a "hourly" or "daily" snapshot, so setting 24h+ is not possible. I would go for something like snapshot_preserve_min <number>, stating "keep the last <number> snapshots".

This could then be combined, e.g. snapshot_preserve_min 50 24h, stating "keep the last 50 snapshot, or the last 24h".

Not sure if this is easily doable, will investigate later.

Nov 22 '19 14:11 digint

I too think this would be useful.

Generally, I even think keep the last <number> snapshots is more useful than keep the last <timespan> snapshots.

But a combined solution like digint proposed (snapshot_preserve_min 50 24h) seems most elegant to me. The two values should probably be ANDed together...

May 01 '22 20:05 camoz

I also stumbled over this… I'm just not sure whether preserve the last n snapshots as a functionality would really solve the issue (at least not in all cases):

Consider the same scenario:
The target host was down for a longer period of time, so all the previous backups that were specified by the retention policy (which kinda gives the granularity of the wayback machine) have already expired and are to be deleted.

Even if there's now a : keep the most recent n backups, it could still be "sabotaged" if the user quickly creates enough backups manually... or simply by having many scheduled ones created in a short amount of time (e.g. hourly, then after 1 day you'd have already 24 backups... so if n was 24 you'd already loose everything older than a day (and older then the longest retention policy).

Not really sure how one can solve the problem at all:
What about having a n per month, week, day, hour? E.g.:

[<daily>d:<Ndaily>] [<weekly>w<Nweekly>] [<monthly>m<Nmonthly>] [<yearly>y<Nyearly>]

Here, <Ndaily> could mean that there must be at least <Ndaily> backups in the (current) daily range (each having about 1 day in-between), if there are fewer, no older daily/weekly/monthly/yearly backups are deleted (as long as there are as many).
For <Nweekly> it would be a bit different, namely (as long as <Ndaily> is fulfilled) if <Nweekly> no older weekly/monthly/yearly backups are deleted but daily ones may be deleted.

Does that make any sense?

Nov 20 '22 22:11 calestyo

btrbk btrbk copied to clipboard

extend retention policy to take into account the number of snapshots and/or time of last successful run

btrbk
btrbk copied to clipboard