backintime
backintime copied to clipboard
Weekly snapshots not being retained
I'm using the default values for Smart Remove. Hourly and daily snapshots are kept or deleted as expected, but no weekly snapshots are kept.
This seems to be the relevant code in /usr/share/backintime/common/snapshots.py:
#keep one per week for the last keep_one_per_week weeks
if keep_one_per_week > 0:
d = now - datetime.timedelta( days = now.weekday() + 1 )
for i in range( 0, keep_one_per_week ):
keep_snapshots = self._smart_remove_keep_first_( snapshots, keep_snapshots, d, d + datetime.timedelta(days=8) )
d = d - datetime.timedelta(days=7)
I'm trying to figure it out, but I thought I'd ask first if there's a known issue.
I found a potential fix for this problem.
I added some "logger.debug" calls, and changed the cron table entry to create log files so that I could see what's happening.
The function "smart_remove_keep_first" keeps the newest snapshot in the specified date/time range, since the snapshots are sorted in descending order of date/time. I added a function "smart_remove_keep_last" that reverses the order of the snapshots and thus keeps the oldest snapshot.
In the code in the previous post, I replaced the call to "smart_remove_keep_first" with a call to "smart_remove_keep_last". So far it seems to be working. But it needs further testing and analysis.
If this is actually a bug, I don't understand why it hasn't been reported before. Can someone else review the code/algorithm and provide an opinion?
@daveTheOldCoder There is a new forming maintaining team and we do review all issues. Is this problem still relevant for you or did you find a solution?
Tag: Feedback, Bug
I'm still using the fix I described above, and it works properly. Please review my posts above. Do you agree with my analysis of the problem?
I can provide the exact changes I made, if that's helpful.
And thanks for taking on the job of maintaining this product. :)
Also: I'm using BackInTime version 1.2.1 (the latest version available with the package manager on Pop!_OS 20.04), with my changes added. I'll look at the latest files in the github repository and see if there are relevant changes. Since this issue had no replies prior to yours, I assumed that the issue had not been fixed.
I just looked at BackInTime version 1.3.2, and the relevant code in common/snapshots.py is unchanged from version 1.2.1.
This can be fixed with a one-line change in common/snapshots.py:
In def smartRemoveKeepFirst
, change:
for sid in snapshots:
to:
for sid in reversed(snapshots):
But it would probably be "cleaner" to rename smartRemoveKeepFirst
to smartRemoveKeepLast
or simply smartRemove
, and then change the calls to it.
Dear Dave, sorry for let you wait on this and thanks for your effort and your suggested fix.
The problem is not only resources but the test coverage. The smart remove behavior is not covered by unit tests. Most of the tests in BIT are system or integration tests. And I assume the related code is not well isolated to create a real unit test for it. Because I am not the founder of BIT and not deep enough into all details I am worried to modify this part of the code without test coverage. Introducing bugs into this behavior can be a huge problem and effect a lot of people.
Would you like to jump in? We take you by hand and mentoring you if you need it. So the first job would be to analyze the testability of the smart-remove code. Second would be to carefully refactor and modify that code, hopefully without modifying its behavior, to make it testable. Third job would be to create real and isolated unit tests to fully cover the current behavior. Forth job then is to introduce your fix, running tests on it etc pp.
EDIT:
This is how the relevant code section (in common/snapshots.py::Snapshots.smartRemoveList
) looks today:
https://github.com/bit-team/backintime/blob/e818aeb2b009e2be052685c70dae202c88a00aee/common/snapshots.py#L1674-L1684
I couldn't resist and took a closer look about testabillity. As I thought it is nearly untestable without heavy mocking and hacking. Beside the following points I assume there will come up even more when looking close and longer. Based on my short observation the following points need refactoring to make the behavior testable:
- I see no reason for existence of an instance of the class
Snapshots
. All its methods should live as module functions. That class holds the logic of the application layer related to snapshot handling. - The method
smartRemoveKeepFirst()
need to be decoupled from theclass SID
. btw: An instance ofSID
represent a specific snapshot. The same is forsmartRemoveKeepAll()
. -
smartRemoveList()
andsmartRemove()
are definitely to long. Separate into several functions or methods. Maybe here is an opportunity to create a class dedicated to this behavior. But all this remove logic could also live in its own modul. -
SID
andSnapshots
shouldn't live in the same module because they serve different layers. - And of course, one of BITs main problems in the code structure, decouple
Config
and its instance from the whole module. The final golden goal is to remove theimport config
from there.
It is much and I just added this points for documentation. I don't think that all aspects need to be full filled to make the target behavior testable. I would say that "decouple from SID
" would be a good start. Implementing this will take multiple PRs.
I still don't understand why no one else has reported this problem. For me, it's a critical problem. I wouldn't be able to use BackInTime without my fix.
Good question. Can you send us a screenshot of your smart remove tab please.
Your screenshot looks OK for me.
I checked my own setup. But my machine do not run every day. So I can not be sure.
This can be fixed with a one-line change in common/snapshots.py:
In
def smartRemoveKeepFirst
, change:for sid in snapshots:
to:for sid in reversed(snapshots):
But it would probably be "cleaner" to rename
smartRemoveKeepFirst
tosmartRemoveKeepLast
or simplysmartRemove
, and then change the calls to it.
I don't get how this solve your problem. Why does it matter if the first or the last snapshot in a week-range is kept? To my understanding this does not explain why you lose the weekly snapshots.
Maybe you can explain it better with same example dates?
It's been almost four years since I submitted the issue. I'll try to find my notes.
A way to efficiently test this issue might be to manipulate the date/time of a VM to simulate the passing of weeks and months quickly.
I have created a testbed with the following shell code in a VM:
for N in {1..60}; do sudo date -s '+7 days'; echo date > backupfile; backintime backup; done
In extreme cases, it appears to work as expected: Keeping "one snapshot per week for the last 52 weeks" really does leave 1 years' worth of weekly snapshots. However, I'm not so sure about shorter timespans. Calculating dates is hard!
My current guess is that we may be dealing with an off-by-1 problem here, where a setting of "one snapshot per week for the last n weeks" only actually keeps one for the last n–1 weeks. This would be especially significant for the default configuration of:
- "one snapshot per day for the last 7 days"
- "one snapshot per week for the last 4 weeks"
- "one snapshot per month for the last 24 months"
I once had a tool called faketime
I used for things like this. Worked well.
I think I'm very close to cracking this case. In this function:
https://github.com/bit-team/backintime/blob/22f468c32732be6c821686e0c504b4dfb920487a/common/snapshots.py#L1686-L1696
… smartRemoveKeepFirst
is called with a max_date
of d + datetime.timedelta(days=8)
— not days=7
!
Therefore, in a routine that says "keep only the youngest snapshot of these 8 days to consider", it will also delete the youngest snapshot of the previous week — but only if it's caught by the last in a series of "keep one per week" calls.
In effect, of "one snapshot per week for the last n weeks", the n-th week has its youngest snapshot thrown away, even though it shouldn't.
I'll need to investigate if changing the call of smartRemoveKeepFirst
to a max_date
of d + datetime.timedelta(days=7)
fixes the problem. There must have been some reason to set an 8
there.
P.S.: The official soundtrack for this bug is Only For The Week.
There are some unit tests in test_snapshot.py
about that smart remove thing.
I touched some once long ago but then gave up. A full refactoring of that tests would be the best. But a first step would be to see if this use case is covered by that tests and add such a test if not.
It occurred to me that what makes the "smart remove" behavior confusing is that the snapshots are named using only the timestamp. The names don't contain "hourly", "daily", "weekly", etc.
I wonder if adding an index file that identifies daily, weekly, monthly and yearly snapshots would help.
Or maybe I'm wrong. :)
There are some unit tests in
test_snapshot.py
about that smart remove thing. I touched some once long ago but then gave up. A full refactoring of that tests would be the best. But a first step would be to see if this use case is covered by that tests and add such a test if not.
I'm confident that I can fix this bug, and to make sure I don't introduce any new ones, I'm doing some manual testing in a VM.
I'm totally unfamiliar with automatic testing, though. I'll take a look, but I'd rather not put this bugfix on the "waiting list" until I've cracked the automatic testing of Smart Remove, because that would probably take a very long time ;)
It occurred to me that what makes the "smart remove" behavior confusing is that the snapshots are named using only the timestamp. The names don't contain "hourly", "daily", "weekly", etc.
I wonder if adding an index file that identifies daily, weekly, monthly and yearly snapshots would help.
Or maybe I'm wrong. :)
I see what you mean, but BiT doesn't work like that. Snapshots are not created as "weekly" or "monthly" snapshots. In fact, they are only characterized by their date+time of saving (plus a random three-digit number, which is called the "tag"). Together, this makes a "Snapshot ID" (SID for short): 20241117-115358-562
The only time when a snapshot might be characterized as "weekly" or "monthly" is when the Smart Remove routine determines: "this snapshot should be kept, because it's the only one from that week/month, and it should be kept according to the removal rules".
Therefore, any single snapshot can start out as an "hourly" snapshot, then become a "weekly" one, and maybe even a "monthly" one later.
I see what you mean, but BiT doesn't work like that.
I know that. To figure out the workaround I posted above, I added logging code in the smart-remove processing to log the "kept" and "removed" snapshots, so I could see what was happening.
I know that. To figure out the workaround I posted above, I added logging code in the smart-remove processing to log the "kept" and "removed" snapshots, so I could see what was happening.
I see, and I've also understood why your workaround is successful. It's hard to put into words ;)
What happens is that the code considers snapshots from 8 days (instead of 7) when determining which to keep for a specific week. But in a one-snapshot-per-week environment, 8 days will likely contain two snapshots. Since smartRemoveKeepFirst
keeps only the newer one, the older one is lost.
This has no consequences as long as the previous week is also considered before deletion. But it fails for the last of the weeks to be considered (resulting in the n–1 situation I described above).
Your workaround reverses the list of snapshots, so in the "8-day-week" that is the last to be considered, the older of the two snapshots is kept, and your problem disappears.
However, it is my understanding that the problem disappears completely when days=8
is changed into days=7
:)