backintime icon indicating copy to clipboard operation
backintime copied to clipboard

Revise Auto- and Smart-Remove feature

Open buhtz opened this issue 1 year ago • 13 comments

Summary

This is a meta issue covering a complex topic related to several issues that can't be solved isolated. The behavior of auto- and smart-remove feature is "unclear". It is not documented. Users have different expectations, based on incorrect wording in the GUI and several bugs. Code related to auto- and smart-remove is not covered enough by tests and not able to test because of its structure.

Beside the details my approach would be to create a mock-up but clear documentation and GUI the way how the auto-/smart-remove feature should work. That docu & GUI then might be a good base for further discussions not focused to much on code details.

What need to be done

  • [ ] Refactoring the related code base to make it testable.
  • [ ] Write unit tests for that code.
  • [X] Document the current behavior (PR #1944).
  • [ ] Provide an overall solution:
    • Fix and modify the behavior where it might be appropriate.
    • Clear wording in the GUI & the documentation about the behavior.

image

Notes and ideas

  • https://github.com/bit-team/backintime/issues/1976#issuecomment-2563715818

Related issues

  • #1041
  • #1094 - Expectations about keep weekly snapshots. Bug description lacks details and might not be a bug in code but in the GUI-wording.
  • #1107 - Unclear docu.
  • #1679 - Priorities or ordering of the related auto-/smart-remove rules, their documentation and GUI-wording.
  • #350 - Relative size values.
  • #711 - Rule priorities and ordering.
  • #1917 - Documentation about free-space-feature.
  • PR #1819 - Minor bug in keep weekly.
  • PR #1944 - Several new unit tests and experiments according to keep weekly feature.
  • #1303 - A feature request about a new count-rule.
  • #1874
  • (Missing one?)

buhtz avatar Nov 27 '24 08:11 buhtz

Mockup

This is my current mockup about how the new auto-remove tab/dialog could look like. The wording is improved reflecting the slightly modified (and fixed) behavior. Suggestions, corrections, improvements are welcome.

  • The yellow boxes are tooltips.
  • Term "smart" removed.
  • The "smart remove" checkbox to disable/enable all "smart" rules at once are removed.
  • Replaced term "snapshot" with "backup". (#1929)
  • Use term "preceding" to make clear that the previous completed day/week/month/year is used for counting. The current (still running and incomplete) day/week/months/year is ignored.
  • In all rules the "most recent" (latest, youngest) backup is kept. If present for all units it would be 23:59 (days), Sunday (weeks), 31th (months), 31th December (years). Currently this is not consistent in BIT, e.g. for months the oldest (1th day in month) is used.
  • The group boxes might not be a good idea.
  • Keep the goal in mind, making the behavior consistent and more clear. It is not the goal to totally modify it, e.g. changing the order of the rules or something like this.

image

Beside this GUI and its tooltip the user manual will be improved with this topic explaining more details and giving examples.

EDIT: I also think about if it is good choice count the "preceding" days/weeks/months/years only? It might be more "natural" to think that the current still running day/week/month/year is also relevant. And it might be easier to explain and express in the GUI.

buhtz avatar Dec 18 '24 08:12 buhtz

I think the sentences in the tooltips are a bit confusing, or at least not super easy to understand:

Complete weeks are counted, starting with the previous week (date from-to). Complete months are counted, starting with the previous month (monthname).

  1. Shouldn't it say completed (past-tense) instead of complete? I understand what a completed week is, a week that's already over. A complete week on the other hand refers to a period of time, but doesn't necessarily imply something in the past.
  2. The terms between the parentheses, I am not entirely sure what their meaning or significance is.

Apologies if this is not the greatest feedback, I'm not a UI/UX expert by any stretch of the imagination, nor a native English speaker for that matter. Just trying to be helpful.

virtuallyunknown avatar Dec 18 '24 10:12 virtuallyunknown

"Preceding" is a bit unclear. Preceding to ... what? Now, probably, but it's not stated. Better be very explicit (KISS principle), use something like "keep all backups... ...from the last NN XX" ...not older than NN XX" (or "younger than") ...created since NN XX ago" (or ...created NN XX ago and later") These ideas are not good enough, but you get the idea.

In the mockup, space and inodes need to go on top. They are technical limitations that set the constraints for everything else.

You can omit the "older than NN years", because all snapshot ages are max'd by their respective number of periods (at least the longest -- years -- must have a default value). Also, make sure that always the highest allowed backup age is obeyed. Age precedence instead of rule application precedence removes the need for the user to think of the latter, except for space and inodes, and is easier to understand -- more KISS. (Dunno if that's not the case already.) If someone wants to keep 1000 weekly backups (I can't think of a reason why, but let's accept it for the sake of the argument), the first applied 15 years would delete the earliest 220 backups they wanted to keep.

Let the user set the max. number of yearly backups, for consistency, and because the "older than NN years" is gone. Also, here a consistent wording would be better: not "per year" but "yearly" (all the other periods are "-ly"), so that no inconsistent wording might confuse the user to be wary of a different treatment of the number.

Also, add an option that is time-independent. IIRC, grsync has something like "No matter what, keep the NN last backups". That has its use: Imagine that the user goes on holidays or sick leave for several weeks, then comes back and finds there was a big mistake made late on the last working day (they were already thinking of the holiday, say, or too ill to concentrate). After an absence of several weeks and the number of retained dailys set to too low, they have to go back up to a week for the status before the mistake and lose the good work until right before the mistake. If there were still the last, say, 20 hourly backups around, they'd lose 59 minutes maximum.

In case users want it less fine grained, this could be done with absolute numbers for every period -- "Keep one x-ly backup for NN , but at least the last MM ." Example: "Keep one monthly backup for 12 months" plus "Always keep the last 10 daily backups" would allow to pick up work at a point within the last 10 days of work even after an extended absence from christmas to easter.

I hope that's not too confused... hard day.

noyannus avatar Dec 18 '24 19:12 noyannus

Thank you for your detailed answer. I appreciate that you took your time to dive into it. I have to ask back because I did not understand all details.

In the mockup, space and inodes need to go on top. They are technical limitations that set the constraints for everything else.

But this rules are executed at last. That is what the GUI tries to reflect. I don't plan to modify the execution order for the upcoming release. I don't want to break to much in the beginning. And I need to refactor the existing behavior to even be able to create unit tests for it. I don't want to modify to much behavior without having test coverage of the current behavior. But we can think about working on this in a later step.

Do I get you right, that the space/inode rules should get executed as first and before every other rule?

You can omit the "older than NN years", because

I do agree here. But that is also something I wouldn’t change at the moment but would save for a later version.

Also, make sure that always the highest allowed backup age is obeyed.

Yes, also a good point.

Let the user set the max. number of yearly backups, for consistency, and because the "older than NN years" is gone.

Agreed.

Also, add an option that is time-independent. IIRC, grsync has something like "No matter what, keep the NN last backups".

I like that.

In case users want it less fine grained, this could be done with absolute numbers for every period -- "Keep one x-ly backup for NN , but at least the last MM ." Example: "Keep one monthly backup for 12 months" plus "Always keep the last 10 daily backups" would allow to pick up work at a point within the last 10 days of work even after an extended absence from christmas to easter.

Got it.

I hope that's not too confused... hard day.

Thank you very much for your input. I will collect your ideas for "new" behavior in separate Issues and schedule them for later releases. And for the current issue I will also consider your input.

buhtz avatar Dec 18 '24 20:12 buhtz

Thank you for the kind words.

In the mockup, space and inodes need to go on top. They are technical limitations that set the constraints for everything else.

But this rules are executed at last. That is what the GUI tries to reflect. ... Do I get you right, that the space/inode rules should get executed as first and before every other rule?

If they are applied last, this could cause removal of backup files (fresh or oldest doesn't matter here). First the backup would be written, then (if the total backups size were over the limits) the backups would be cut back to the allowed limits. It makes me feal queasy that user-set limits are overwritten automatically, and that files are removed after they seemingly have been successfully written. Possibly even without user notification... Imagine a machine crash after the backup had been completely written, exceeding the allowed space/inodes, but before the total backup size could be decreased again as needed and the according error cold be raised/notified/logged. The excess space/inode use would not become known, nor would the reduction fully take place. If, OTOH, BiT determined the available space/inodes first, it cold stop during the backup writing with an "out of allowed space or inodes" error when it reached that limit. It has to monitor its space/inode use for that, of course. So the prodecure would be: -Apply the space/inode rule first, that is, BiT determines how much space / how many inodes are available within the set limits, -then write the backup while continuously checking that the next file to write will not cause the total backups size to exceed the limit. -If it will, stop writing, raise the above error and notify/log, mark the backup as incomplete, and give some feedback to the user as to what caused the error, and that backups need to be pruned or moved to a bigger volume before operations can be taken up again.


Another wording point: "Applied in descending order" uses a word that in IT circles applies to values ("sort descending"). I would be more clear that their sequence is meant with "Back In Time goes through the rules below from top to bottom. Should rules conflict, the longer retention time will be applied" or so.

noyannus avatar Dec 18 '24 22:12 noyannus

I like the idea of handling the space and inode limits differently.

As it is, it's unclear what gets deleted if you are out of space or inodes. I have always kept those unchecked, because in such a circumstance, I would prefer that it just fails, and then I will look at what's going on, maybe reducing the retention if appropriate.

For example, I've had it happen that I added a mount to (huge) external storage without remembering to exclude it before the backup ran. If Back In Time were just deleting backups as needed to back that up, it could wipe out all of my backups just to create some monster backup that I don't want.

DerekVeit avatar Dec 19 '24 00:12 DerekVeit

As it is, it's unclear what gets deleted if you are out of space or inodes.

U-oh... And that is reknowned, recommended, and widely deployed... And nobody noticed and acted!?! <shakes head sadly, then starts banging head against wall>

You guys are doing a d*n needed job here!

To find out what actuall happens:

  1. Use shell to create a tree of folders, each holding one or a few small, equal-size test files. Some nested for ... in ... loops to assemble path and filename, with mkdir -p <path> and dd if=/dev/urandom of=<path/and/filename> bs=1M or so inside --you get the idea. Would be helpful if they were named in a way that their position can be deduced from the name alone, like 3.4.6.7= "folder '7' in folder '6' in folder '4' in folder '3'; that would make it easier to evaluate long file/folder lists later. Also good for a vgrep[*]: use fixed number of digits for all name-numbers; or limit the files/folders to one digit number.

  2. Create a small partition somewhere, larger than the space needed for the above tree, but not too much larger. Smaller than 2x the total backup size would be right, if you want to see BiT behaviour when it hits a space limitation without space/inode limit set, and don't want to write several backups first.

  3. (Un)set limits according to what you want to test. The space/inode settings should not allow a full backup, and of course be lower than partition capacity.

  4. Back up the file tree to the new partition.

  5. Use shell to overwrite all the files in the tree. (Maybe touch is enough to trigger a need for fresh backup of the files? Then it's find <partition> -xdev -exec touch '{}' \; <= no search criteria! ).

  6. Back up the tree again.

=> Observe how BiT behaves when it reaches the limit. You might want to use a tool that logs and/or continuously monitors used inodes and total backups size. Logging would be great to see if things were written and later removed. => Check what files were added to the backup, i.e. what is not a hardlink AND younger than first backup.

[*]"visual grep", i.e. scan with your eyes through long lists.

noyannus avatar Dec 19 '24 09:12 noyannus

I am pretty sure, after checking the code, that the oldest backups are removed until enough free space/inodes are available again. And yes, this happens without respecting the other rules.

Not cool, I know. But a rare case. I won't change it now, for the next release.

But I definitely will change that behavior in the future. I like the idea to just remove those rules and transform into a clear warning/error message. Just give a message if the user defined threshold is reached.

We have some other issues related to out-of-space situations. Beside using a user definied threshold BIT could also estimate the expected size (and maybe inodes) that might be used by a new backup. And it would warn before starting the backup that there might be not enough space/inodes anymore. But the estimate would be very blurry and might cause more confusing then benefit.

buhtz avatar Dec 19 '24 10:12 buhtz

For the current out-of-space removal, data loss may occur. If it's not there already (have not checked), add a line to warn users to the Readme, and to known issues, just to save the one or two rare cases from the looming doom and despair. :-) A small GUI label "This overrides all other settings here" would be a good precaution, too.

Pre-backup guesstimates would be too inaccurate, IMO, even when derived from recent backups. E.g, someone switches from mail and text work to work with virtual machines (with their large disk files, esp. if these are not snapshotted themselves), or vice versa, and any estimate will be misleading at best.

In BiT, you could implement an early warning, sort of "Your backups are approaching the [ maximal available space of the backup volume | [ space|inode ] limit you have set ]." Any default of sufficient size will be useful. (Just guessing for space: 10% or 50GB left; I have no idea for a good inodes value here.)

This could also be customized by the user in an additional child setting below the limit setting lines: ".. and alert me when <space|inode warning setting> is reached."

The warning could also be independent from limits acutally being set and positioned in the GUI on the same level: "Warn me when only XXX free space-units/inodes are left" (absolute or as percentage).

The customizable in-app warnings are definitely a nice to have, to be postponed now, IMO (unless they are rather quick to implement; no idea if they really are).

noyannus avatar Dec 19 '24 14:12 noyannus

Deleting oldest backups first in that case is what I would have guessed. And those are not the ones I would consider most disposable if I were choosing manually.

But someone with a very different retention strategy might find it useful. For example, someone who only keeps daily backups for the last N days might be happy with that N being reduced rather than missing a backup.

Looking ahead to check for space or inode availability becomes a lot of work trying to predict something that can't really be made reliable anyway. And if you are out of space, failing before trying is only marginally better than failing after trying.

DerekVeit avatar Dec 19 '24 16:12 DerekVeit

Sorry to be late to this party!

I like the redesign in the mockup, especially matching the order of actions in the GUI with execution order and separating the "Keep" rules. This makes it easier to understand how the options work together, ie. sequentially.

IMO there is no reason to remove the first option (remove if older than N years) or the last (remove using free space or inodes criteria). Changing the order in the GUI has already made things clearer, and with suitable documentation these options can be useful. Users can choose to use them or not. I think the concerns about these options (eg. which backups get removed to free up space or inodes) mostly disappear if their operation is properly documented. The most obvious method is to remove the oldest and I think this is what most users would expect. Ultimately if users don't like how it works, then they just don't use it. They can remove backups manually.

As "one snapshot per year for all years" is not an option I don't see why it needs to be here, esp. in the "Keep" box. Either move it to the bottom as a comment, or even remove it. This feature of BiT should be documented anyway.

Need to define "week" - ie. Sunday to Saturday, or Monday to Sunday and not any 7 consecutive days. Also should state that "month" means a calendar month.

Also, add an option that is time-independent. IIRC, grsync has something like "No matter what, keep the NN last backups".

I'm not sure this is necessary - just don't use auto-remove. The remove/keep options are already complex without adding more.

The documentation should include a few example configurations with clear explanations of how backups are retained and deleted as time progresses.

I'm not sure that all the "hand-holding" that's been suggested is needed. Good documentation (I keep repeating!) solves most issues. As a native English speaker I will offer to help if wanted. IMO the most useful extra would be a trail run option (like the rsync -n option) - show what would happen without actually doing anything.

ceperman avatar Dec 23 '24 03:12 ceperman

Thank you for all your input.

  • Issue #1976 does describe the problems and modifications planed for the upcoming release. The primary goal is to keep the current auto/smart remove behavior.
  • Issue #1977 does describe much more modifications to the auto remove behavior for a later release.

buhtz avatar Dec 23 '24 21:12 buhtz

Hello,

again I am interested in your opinions about my approach about the auto-remove feature for the upcoming release. For a new mockup and details please see #1976.

Regards, Christian

buhtz avatar Dec 26 '24 15:12 buhtz

That is the latest version of the dialog:

Image

buhtz avatar Jul 07 '25 08:07 buhtz