vorta icon indicating copy to clipboard operation
vorta copied to clipboard

Run missed backups after startup - does not work

Open goebbe opened this issue 3 years ago • 33 comments
trafficstars

Describe the bug Testing Vorta 0.8.2 it seems that the option "Run missed backups right after startup or wakeup" does not work. I installed Vorta for testing purpose, using flatpak, on LinuxMint 19.2

To Reproduce

  1. Go to Schedule
  2. Choose a daily backup time in the near future.
  3. Select "Run missed backups right after startup or wakeup"
  4. Wait till the (scheduled) backup time has passed
  5. Start Vorta: The missed backup does not run.

Additional observations:

  • I tested both a) quitting Vorta, wait and restarting Vorta after the missed backup time and
    b) shutdown the computer, wait and start the computer after the missed backup time in both cases, the missed backup did not run after restart.
  • There is only one profile. I renamed the default profile to "Vorta Test" <- in the log file one can read about "site default" so perhaps this is causing the issue?

Environment (please complete the following information):

  • OS: Linux Mint 19.2 Cinnamon (based on Ubuntu 20.04)
  • Vorta version: 0.8.2 using borg 1.1.15
  • Installed from: flatpak

Additional context Protocol: 7:30 change backup time to 8:00, Quit Vorta 8:17 start Vorta to check if the missed backup is run

** Log: 2021-12-08 07:30:32,800 - vorta.scheduler - INFO - Setting timer for profile 1 2021-12-08 07:30:32,805 - vorta.scheduler - DEBUG - Scheduling next run for 2021-12-08 08:00:00 2021-12-08 07:30:55,141 - vorta.borg.borg_job - DEBUG - Cancel job on site default 2021-12-08 07:30:55,145 - vorta.borg.jobs_manager - INFO - Finished cancelling all jobs 2021-12-08 08:17:26,203 - vorta.i18n - DEBUG - Loading translation succeeded for ['en', 'en-US', 'en-Latn-US']. 2021-12-08 08:17:26,489 - root - INFO - Using NetworkManagerMonitor NetworkStatusMonitor implementation. 2021-12-08 08:17:26,592 - vorta.borg.jobs_manager - DEBUG - Add job for site default 2021-12-08 08:17:26,592 - vorta.borg.jobs_manager - DEBUG - Start job on site: default 2021-12-08 08:17:26,601 - vorta.borg.borg_job - INFO - Running command /app/bin/borg --version 2021-12-08 08:17:27,054 - vorta.borg.jobs_manager - DEBUG - Finish job for site: default 2021-12-08 08:17:27,055 - vorta.borg.jobs_manager - DEBUG - No more jobs for site: default 2021-12-08 08:17:27,055 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2021-12-08 08:17:27,061 - vorta.scheduler - INFO - Setting timer for profile 1 2021-12-08 08:17:27,064 - vorta.scheduler - DEBUG - Scheduling next run for 2021-12-09 08:00:00

goebbe avatar Dec 08 '21 07:12 goebbe

daily backup time

If the previous backup is too far in the past, it will log "Catching up with missed backups" or something. I don't see that here.

Have to admit that I only tested extensively with the hourly option. Relevant code from scheduler.py

Since you don't get the "Catching up..." log, last_run_log would need to be less than a day.

        # Desired interval between scheduled backups. Uses datetime.timedelta() units.
        if profile.schedule_mode == 'interval':
            interval = {profile.schedule_interval_unit: profile.schedule_interval_count}
        elif profile.schedule_mode == 'fixed':
            interval = {'days': 1}

        # If last run was too long ago and catch-up is enabled, run now
        if profile.schedule_make_up_missed \
                and last_run_log is not None \
                and last_run_log.end_time + timedelta(**interval) < dt.now():
            logger.debug('Catching up by running job for %s', profile.name)
            self.lock.release()
            self.create_backup(profile.id)
            return

m3nu avatar Dec 08 '21 08:12 m3nu

Correct, my last backup was yesterday, 2021-12-07 16:29:59 i.e. 14 hours before I tried to test the "Run missed backups" option as described above.

So the "Run missed backups"-option rely on the last backup time? In this case, the option would rely on the last run - even if I change the timing for daily backups? From a user perspective, this could lead to unexpected results. :-)

Case: User sets up Vorta and daily backup at 5 pm on his laptop. However, the laptop is off at 5 pm the very first day. Result: The missed backup is not discovered.

Could we store a "fake" last_run_log.end_time whenever we change the schedule?

I (wrongly) assumed that Vorta compares the last "Next backup" time with the starting time. Something like: If the (last "Next backup" time) < starting time:
do a backup

This, of course, would require the computation and storing of the "Next backup" time, whenever the schedule is changed or a backup runs.

goebbe avatar Dec 08 '21 09:12 goebbe

Yes, it relies on the event log of the last scheduled backup. User-initiated backups are ignored. Since you are asking for daily backups, no catch-up will happen unless 24 hours have passed.

If you change the timing, it will still look at the last backup run, but apply the new desired timing. E.g. every X hours

Next backup shows what is scheduled for the future.

When removing APScheduler, I had to add our own logic and tried to keep it simple for now. You can find the details in scheduler.py. You think anything needs changing in this logic?

m3nu avatar Dec 08 '21 10:12 m3nu

Thank you - now I understand the logic behind it and see why the issue appears. A problem could be that this misses some corner cases - as reported above.

Perhaps I am wrong, but it seems to me that all the necessary computations should be done for next backup anyway.

Would the following work?

Write next_backup_time to the event log.

When Vorta starts (i.e. before updating next_backup_time) : 
last_scheduled_backup_time = next_backup_time 

 if ( last_scheduled_backup_time < dt.now):
        do a backup 

Hm, I see now that the code sniped in your post uses last_run_log.end_time. Wouldn't last_run_log.end_time + timedelta(**interval) always be after the next scheduled time, the next day? Assume daily backups at 1 pm and backups take 3 hours:
If the computer is down, at 1 pm next day and then started at 2 pm, the backup would be missed by the "Run missed backups" option since:
last_run_log.end_time+timedelta(**interval) would result in 4 pm + 1 day which would be two hours after the missed backup - and the missed backup would not be caught.

goebbe avatar Dec 08 '21 12:12 goebbe

Assume daily backups at 1 pm and backups take 3 hours: If the computer is down, at 1 pm next day and then started at 2 pm, the backup would be missed by the "Run missed backups" option since:

Yes, then it will run at 4pm the next day or the day right away after it's turned on again.

I added and used end_time for this purpose to avoid adding new backup runs when the previous one isn't finished or takes long time. Think of an hourly schedule where the backup takes 3 hours. Using start_time for scheduling missed jobs is equally valid with other pros and cons.

m3nu avatar Dec 08 '21 12:12 m3nu

I added and used end_time for this purpose to avoid adding new backup runs when the previous one isn't finished or takes long time.

But in scheduler.py you have the following lines :


# Skip if a job for this profile (repo) is already in progress
        if self.app.jobs_manager.is_worker_running(site=profile.repo.id):
            logger.debug('A job for repo %s is already active.', profile.repo.id)
            return

So even if you add 10 backup jobs on a profile queue, only one will be run. So why not take the start time for scheduler ?

New issue title : "Run missed backups option fails if the backup has never been launched."

bastiencyr avatar Dec 08 '21 15:12 bastiencyr

So we have at least two corner cases where the option does not just work as expected, i.e. the backup will not "Run missed backups after startup". This should not happen unnoticed.

With respect to the used end_time: What happens if somebody putts his laptop off during a backup? Will Vorta automatically catch this case and restart borg to finish the backup task when the computer is on, again?

goebbe avatar Dec 08 '21 15:12 goebbe

What happens if somebody putts his laptop off during a backup? Will Vorta automatically catch this case and restart borg to finish the backup task when the computer is on, again?

I dont know. I think that Vorta doesnt catch this case :

last_run_log = EventLogModel.select().where(
            EventLogModel.subcommand == 'create',
            EventLogModel.category == 'scheduled',
            EventLogModel.profile == profile.id,
        ).order_by(EventLogModel.end_time.desc()).first()

Ive just tested and if you off the app, there is a line in the log with a create subcommand. Since the above command doesnt check the returncode of the last backup, I dont think that Vorta automatically catch this case. Can you confirm @m3nu ?

bastiencyr avatar Dec 08 '21 15:12 bastiencyr

New issue title : "Run missed backups option fails if the backup has never been launched."

Thanks, this describes one issue, the other issue would be: "Run missed backups option can fail after long backups"

goebbe avatar Dec 08 '21 15:12 goebbe

What happens if somebody putts his laptop off during a backup? Will Vorta automatically catch this case and restart borg to finish the backup task when the computer is on, again?

If the backup fails, then by default end_time=start_time. So it would assume the task succeeded at the start time.

Good to have a discussion. So to summarized the proposed changes:

1 Since we detect duplicate jobs by now (we didn't when I first implemented the scheduler), do catch-up based on starting time.

2.1 When the job starts, an event log is created with start_time = end_time = now and returncode=1. Instead returncode should be set to 2 (=abnormal termination) and only updated to 0 or 1 (=no errors or minor permission errors) after the backup task is complete.

2.2 Further, the scheduler should filter by returncode of the event and only consider events with 0 or 1 return codes.

m3nu avatar Dec 09 '21 01:12 m3nu

In my experience, borgbackup does a decent job after abortion/ network failures. Normally, borg can create a new backup (and finish the backup) just fine. So maybe the "run missed backups"-option should/ could also just create a new backup whenever a backup was missed (also in the case of abnormal termination)? And only notfy the user when this attempt fails?

goebbe avatar Dec 09 '21 06:12 goebbe

With respect to the original issue, I just tried to understand scheduler.py (keep in mind, I never programmed anything like this).

Whenever the schedule is changed by the user, then I suggest to set: last_run_log = None (Where in the code would that happen??) Reason: With the new time/ schedule, there is nothing like a "last run" that fits to the new schedule anyway. With this in place, one could adapt the existing code:

        if last_run_log is None:       # set virtual last_run
            last_run = dt.now().replace(hour=0, minute=0)
            if profile.schedule_mode == 'fixed':        # new: set a virual last_run for fixed time
                last_run -= timedelta(**interval)
        else:
            last_run = last_run_log.end_time
        
        ...

        # Add interval to last run time to arrive at next run.
        next_run = last_run
        now = dt.now()
        missed_backups = -1      # new variable
        while next_run < now:
            next_run += timedelta(**interval)
            missed_backups += 1
        if profile.schedule_make_up_missed and missed_backups > 0:    # new: catch missed next_run
            next_run = now + timedelta(seconds=5)

The idea is that next_run can take care of the backup in the usual way. This should make the old code block (that takes care of the missed backups) obsolete.

goebbe avatar Dec 09 '21 07:12 goebbe

I wouldn't add an artificial last_run. It will just take the actual last run from the profile. The user wanting a new schedule doesn't change the last run. E.g. schedule changes from daily to hourly. If the last_run was the day before, the user would want to catch up quickly upon changing to hourly. This is what happens currently.

Currently the scheduler is built to be robust and produce the same result when it's reloaded. And reloads are happening very often: When a user changes the setting, every 15min to make sure all jobs are scheduled or after a backup run. This happens by calling set_timer_for_profile(). Only inputs are the log of the last run and the current scheduler setting. If we add some special virtual last run, this will make it more complicated.

m3nu avatar Dec 09 '21 11:12 m3nu

In my experience, borgbackup does a decent job after abortion/ network failures. Normally, borg can create a new backup (and finish the backup) just fine.

Yes, I don't doubt that. There are also checkpoints to resume interrupted backups. That's all great and important. For the case of the scheduler, we mostly care if a backup was successful. If it wasn't we want to retry soon (and rely on Borg to properly finish it this time)

m3nu avatar Dec 09 '21 11:12 m3nu

I wouldn't add an artificial last_run. It will just take the actual last run from the profile. The user wanting a new schedule doesn't change the last run. E.g. schedule changes from daily to hourly. If the last_run was the day before, the user would want to catch up quickly upon changing to hourly. This is what happens currently.

Actually, if I am not mistaken, you added the artificial last_run! ;-)
Note that the last_run is modified at different lines in the existing code base. last_run provides the basis for the computation of next_run - perhaps the name could be changed to something more meaningful? init_run or basis_run or base_time?

I just proposed to add two lines to set a better(?) initial value in the case of fixed (daily) intervals. This is necessary (in the proposal) to get a correct starting point for the computation missed_backups and consequently of the next_run.

Agreed that log_last_run should not be set to none, that could be very confusing. Is there a better way to capture a change of the schedule?

The idea was to avoid the current duplication of the backup logic - and to fix the reported issue. But as I said, I am not an experienced programmer. I just hoped this could be helpful, somehow. :-)

goebbe avatar Dec 09 '21 13:12 goebbe

Correct. It will use midnight if the backup never ran ever.

Regarding the proposed patch, I fail to see the improvement and I'm not sure this will work or even give the desired result for you. Have you actually tested the code? E.g. if nothing was missed it won't set next_run to a future time.

It also doesn't address the three items discussed earlier. So a more complete improvement that is still easy to read and understand would be good.

m3nu avatar Dec 09 '21 14:12 m3nu

Sorry, I don't mean to offend anybody. If you outright reject the ideas behind the implementation, there is no point in testing this. But I see that there is an egg and hen kind of problem. Why should you look at code if it is not tested?
This is new to me, I have never worked with git and my only python experience comes from using numpy and sage math - this is why I am reluctant.

Yes, the patch should solve the three issues - but would rely on discovering a change in the schedule by setting log_last_run to be none. If nothing was missed, the setting of next_run works like before. Nothing has changed in this case.

If you could give me a hint, how I could put a variable to check for a change of the schedule, I would like to test this. This would require some learning on my side for the set up - and could take some days.

goebbe avatar Dec 09 '21 15:12 goebbe

In that case you may want to start with a simpler issue to get to know the project and tools. Like our Good first issues

After that, jump on the hard problems. 💪

Since the scheduler is an important part, we can't put half-baked changes out. There should also be tests to try out different behaviors and make sure code changes don't change the result. I didn't get to those yet, but they would be good to have soon.

m3nu avatar Dec 09 '21 15:12 m3nu

For future reference, what are we looking to change in a nutshell, @goebbe ? Without regard of who or how it will be implemented.

m3nu avatar Dec 14 '21 07:12 m3nu

New issue title : "Run missed backups option fails if the backup has never been launched after setting or changing the schedule."

Goal: Make sure that missed backups are always discovered.
Discovering missed backups should work even when the first backup is missed OR when the schedule is changed in the schedule_tab (and the first backup is missed)

This could possibly be implemented by:

  • Whenever the schedule is changed in schedule_tab: set an appropriate "starting value" (i.e. last_run) - this is required for the computation of next_run
  • discover missed backups when updating next_run
  • when missed backups are discovered, set next_run to dt.now()

Note: The existing code already takes care of old backup jobs that are not finished.

goebbe avatar Dec 14 '21 07:12 goebbe

@m3nu Right now, this is too difficult for me. :-( There are too many concepts in the programme-code that are new to me, and I currently don't have the time necessary to learn. I am sorry, I hope the bug report and the discussion is helpful anyway.

goebbe avatar Dec 16 '21 07:12 goebbe

I have just run into this bug as well on macOS 12.1, Vorta 0.8.2. Because the scheduling is broken I didn't get any notification from Vorta that backups were failing. Thankfully borgbase.com has alerts for this type of situation.

I appreciate all the work that has been done to diagnose this issue, but it doesn't seem like a fix is imminent. Is there some workaround I could take in the UI to get the scheduler working again? I think my situation is that the first scheduled backup never ran (it's hard to tell because there is another bug with the next scheduled time #1116) and so subsequent backups will never be scheduled... ever? If I change my schedule to backup every minute, let one backup complete, then change the schedule to what I actually want (once every 7 days) will it work after that?

mplorentz avatar Jan 20 '22 19:01 mplorentz

@mplorentz The easiest workaround is: After setting/ changing the schedule, run the backup manually, once.

At the moment, "run missed backups" relies on the timing of the "last_run" backup.

Edit: Run a manual backup will not provide a workaround - the "last_run" backup has to be of category 'scheduled'. It is necessary to trigger a first backup via the scheduler.

goebbe avatar Jan 20 '22 20:01 goebbe

@goebbe ok thanks! I will give that a try.

mplorentz avatar Jan 20 '22 22:01 mplorentz

I appreciate all the work that has been done to diagnose this issue, but it doesn't seem like a fix is imminent. Is there some workaround I could take in the UI to get the scheduler working again?

This issue here talks about "Running missed schedules", which means backup runs that the normal scheduler missed. It is NOT about jobs that are scheduled normally.

If you have any details on problems with the (new) scheduler implementation, please be so kind and share them including how to reproduce and logs. 🙏

I think my situation is that the first scheduled backup never ran (it's hard to tell because there is another bug with the next scheduled time #1116) and so subsequent backups will never be scheduled... ever?

Again, this issue is about the preview text of the scheduler not being updated in some cases not being updated. This is unrelated to the actual scheduler.

If the first schedule never ran, it would be a serious bug and it would be great if you can share steps to reproduce and logs for it.

And last, the "Run missed Backups" relies on the scheduled backup task to run normally at least once. Otherwise it's not really "missed". The point of the feature is to cover situations where the machine was switched off or hibernated for a while.

m3nu avatar Jan 21 '22 04:01 m3nu

If I change my schedule to backup every minute, let one backup complete, then change the schedule to what I actually want (once every 7 days) will it work after that?

@mplorentz I think this will work while goebbe's suggestion should not as far as I understand the code.

real-yfprojects avatar Jan 26 '22 16:01 real-yfprojects

Changing the schedule (to every 5 minutes) seems to have fixed my issue of the backup never running. However I'm not able to reproduce the issue so I'm not sure how I got into that state.

mplorentz avatar Feb 03 '22 00:02 mplorentz

Every 5min is quite often. But good it's resolved. Our own scheduler implementation that replaced the APScheduler package is still quite new and currently focuses on simplicity rather than covering all edge cases. So I expect some adjustments in the future, as more feedback comes in.

m3nu avatar Feb 03 '22 05:02 m3nu

So I just recently setup Vorta and have it set to do a daily backup at 3:42am. For some reason, my laptop goes to sleep before then, so the backup is never occurring(different problem). But because the backup is never occurring and I have the Run missed backups right after startup or wakeup checked, I would assume that it would do it immediately upon me actually waking up the computer, but it doesn't.

Instead this is what I see in my Vorta.log file:

2022-02-07 01:41:03,254 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 01:41:03,255 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-07 03:42:00 2022-02-07 01:56:03,240 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2022-02-07 01:56:03,244 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 01:56:03,245 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-07 03:42:00 2022-02-07 02:11:03,230 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2022-02-07 02:11:03,234 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 02:11:03,235 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-07 03:42:00 2022-02-07 02:26:03,223 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2022-02-07 02:26:03,227 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 02:26:03,228 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-07 03:42:00 2022-02-07 04:41:01,100 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2022-02-07 04:41:01,111 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 04:41:01,113 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-08 03:42:00 2022-02-07 06:55:59,849 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2022-02-07 06:55:59,853 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 06:55:59,854 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-08 03:42:00 2022-02-07 08:01:26,387 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2022-02-07 08:01:26,399 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 08:01:26,401 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-08 03:42:00 2022-02-07 08:16:26,375 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2022-02-07 08:16:26,381 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 08:16:26,382 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-08 03:42:00 2022-02-07 12:30:45,393 - vorta.scheduler - DEBUG - Refreshing all scheduler timers 2022-02-07 12:30:45,403 - vorta.scheduler - INFO - Setting timer for profile 1 2022-02-07 12:30:45,404 - vorta.scheduler - DEBUG - Scheduling next run for 2022-02-08 03:42:00

Repeated over and over, every day.

So definitely something that isn't quite behaving the way I would think it would based on settings and behavior.

abodnar avatar Feb 09 '22 02:02 abodnar

Not immediately, but within a few minutes. The backup timer runs ever 10 to 15 min to check for missed backups.

You also need to enable the option to catch up with missed backups. It's not on by default.

m3nu avatar Feb 09 '22 10:02 m3nu