vorta icon indicating copy to clipboard operation
vorta copied to clipboard

Option to run pre/post-backup scripts before *every* command, rather than *create* only

Open frederikmoellers opened this issue 4 years ago • 28 comments

Somewhere between 0.6.10 and 0.6.22 vorta introduced a change that made sure that pre- and post-backup commands are run before Repo checks. I think this was PR #264. This enabled me to use pre- and post-backup scripts for mounting and unmounting a samba share where the actual backup is stored.

However, these commands seem to only run when an actual backup task is being executed (borg create). Most importantly, it seems that the post-backup command is run after the backup finished but before the pruning is done. This results in the prune command failing (repo not found) and thus no pruning happening at all.

In vorta's GUI log I see that the returncodes are nonzero whenever the command tries to access the repo, but not when performing a backup via borg create: image

In the log file I see that the post-backup command seems to run directly after create, but before prune. Furthermore, I see that all commands except borg create seem to fail:

2019-10-31 12:00:00,793 - apscheduler.executors.default - INFO - Running job "VortaScheduler.create_backup (trigger: cron[hour='12', minute='0'], next run at: 2019-10-31 12:00:00 CET)" (scheduled at 2019-10-31 12:00:00+01:00)
2019-10-31 12:00:00,797 - vorta.scheduler - INFO - Starting background backup for Default
2019-10-31 12:00:00,799 - vorta.notifications - DEBUG - success notifications suppressed
2019-10-31 12:00:00,809 - vorta.borg.borg_thread - DEBUG - Using VortaDBKeyring keyring to store passwords.
2019-10-31 12:00:02,688 - vorta.scheduler - INFO - Preparation for backup successful.
2019-10-31 12:00:02,718 - vorta.borg.borg_thread - INFO - Running command /usr/bin/borg create --list --info --log-json --json --filter=AM -C lz4 --exclude-if-present .nobackup /backup/Default/::computername-default-2019-10-31T12:00:02 /home/username/
2019-10-31 12:00:04,233 - vorta.borg.borg_thread - INFO - Creating archive at "/backup/Default/::computername-default-2019-10-31T12:00:02"
2019-10-31 12:01:22,933 - vorta.notifications - DEBUG - success notifications suppressed
2019-10-31 12:01:22,933 - vorta.scheduler - INFO - Backup creation successful.
2019-10-31 12:01:22,934 - vorta.scheduler - INFO - Doing post-backup jobs for Default
2019-10-31 12:01:22,936 - vorta.borg.borg_thread - DEBUG - Using VortaDBKeyring keyring to store passwords.
2019-10-31 12:01:22,961 - vorta.borg.borg_thread - INFO - Running command /usr/bin/borg prune --list --info --log-json --keep-hourly 0 --keep-daily 7 --keep-weekly 4 --keep-monthly 6 --keep-yearly 2 --prefix computername-default- --keep-within 10H /backup/Default/
2019-10-31 12:01:23,483 - vorta.borg.borg_thread - ERROR - Repository /backup/Default does not exist.
2019-10-31 12:01:23,646 - vorta.borg.borg_thread - DEBUG - Using VortaDBKeyring keyring to store passwords.
2019-10-31 12:01:23,676 - vorta.borg.borg_thread - INFO - Running command /usr/bin/borg list --info --log-json --json /backup/Default/
2019-10-31 12:01:24,064 - vorta.borg.borg_thread - ERROR - Repository /backup/Default does not exist.
2019-10-31 12:01:24,228 - vorta.borg.borg_thread - DEBUG - Using VortaDBKeyring keyring to store passwords.
2019-10-31 12:01:24,258 - vorta.borg.borg_thread - INFO - Running command /usr/bin/borg check --info --log-json /backup/Default/
2019-10-31 12:01:24,639 - vorta.borg.borg_thread - ERROR - Repository /backup/Default does not exist.
2019-10-31 12:01:24,791 - vorta.scheduler - INFO - Finished background task for profile Default
2019-10-31 12:01:24,793 - apscheduler.executors.default - INFO - Job "VortaScheduler.create_backup (trigger: cron[hour='12', minute='0'], next run at: 2019-11-01 12:00:00 CET)" executed successfully

I'm currently using vorta 0.6.22, but the changelog of 0.6.23 doesn't mention anything related to this.

frederikmoellers avatar Oct 31 '19 12:10 frederikmoellers

According to the logs, there is an issue with your repo path. Maybe a drive not mounted? Does it work with the CLI directly? (see the logs for the precise arguments)

ERROR - Repository /backup/Default does not exist.

m3nu avatar Nov 20 '19 08:11 m3nu

Yes, the issue path is an empty directory. I use the pre- and post-backup scripts to mount a samba share into the directory. The backup is then on the share. The pre-backup command checks if the server is available and mounts the share. The post-backup command unmounts the share.

Unfortunately, the execution of the pre- and post-backup scripts does not seem to appear in the log, so this is not visible. However, judging from the log messages, I suppose the execution order is like this:

  1. Run pre-backup script (mount samba share)
  2. borg create (succeeds b/c share is mounted and repo is available)
  3. Run post-backup script (unmount samba share)
  4. borg prune (fails b/c share is no longer mounted and repo cannot be found)
  5. borg list (fails for same reason)

Imho the post-backup script should only be run after all backup-related commands have been executed, so after the execution of borg list.

frederikmoellers avatar Nov 20 '19 10:11 frederikmoellers

I just confirmed this in the code:

In src/vorta/borg/create.py#L99, the pre-backup command is executed. When the thread ends (read: after borg create finishes), #L46 executes the post-backup command.

However, this only happens for create, not for any of prune or list. In prune.py, the pre- and post-backup commands are not executed. I think it should be possible to also have them execute for these subcommands.

If anyone can confirm that this would be useful, I can make a PR. Maybe it makes sense to execute the pre-backup script in BorgThread.prepare()? That would make sure that it's executed before any repository-related task.

frederikmoellers avatar Nov 20 '19 10:11 frederikmoellers

Having the same issue here.

Janhouse avatar Feb 28 '20 13:02 Janhouse

If anyone can confirm that this would be useful, I can make a PR. Maybe it makes sense to execute the pre-backup script in BorgThread.prepare()? That would make sure that it's executed before any repository-related task.

Originally those pre/post-backup commands were meant to prepare the data for backup. Like e.g. Borgmatic does with their hooks.

If you need a command to even connect to the repo, I would regard it as different feature?

What are you guys using it for roughly? Probably not to prepare some data for backup (e.g. dump a database).

m3nu avatar Feb 28 '20 13:02 m3nu

I use pre-backup/post-backup to mount and umount network storage. It is available only on certain networks so it can't be mounted all the time. And if it is not mounted before backup or prune jobs, it won't work.

It should run before any task that involves using the repository, and run the post-backup task afterwards.

Janhouse avatar Feb 28 '20 14:02 Janhouse

This is quite different from the current use case, which is like a before_create/after_create hook. You are suggesting before/after_everything. How would this look in the UI?

m3nu avatar Feb 28 '20 16:02 m3nu

It could be either a different set of pre/post scripts or maybe a checkbox next to existing ones with text "run before/after everything" and an argument passed to the script to know what event just happened.

Janhouse avatar Feb 29 '20 17:02 Janhouse

I guess an argument to the scripts would be easiest and least intrusive. It doesn't need any GUI changes and lets the script decide what to do depending on the context: DB dump scripts will only act on "backup" arguments, mount scripts for network shares might act on any action.

frederikmoellers avatar Mar 01 '20 03:03 frederikmoellers

a checkbox next to existing ones with text "run before/after everything" and an argument passed to the script to know what event just happened.

I can also live with a checkbox.

m3nu avatar Mar 01 '20 03:03 m3nu

yep, I have just come across this problem - I thought it was me for ages (pruning not running because post-backup script to dismount drive had been executed beforehand).

Is there timing for this?

ghost avatar Jul 11 '20 08:07 ghost

Is there timing for this?

Unfortunately, no. I will try to dedicate some time to this, but I can't give any estimates at the moment. If anyone is interested in working on this, feel free.

frederikmoellers avatar Jul 15 '20 10:07 frederikmoellers

I have a PR almost ready with the functionality. One question rose, though:

Would it be useful to execute the commands on every action? This would even include running the post-backup-command after "mount", the pre-backup-command before "umount" and both commands around "version". For cases where the commands are used to e.g. mount filesystems, this doesn't make sense. borg --version can run without access to the repos and borg mount should not finish with unmounting the underlying filesystem (in the post-backup-command) where the repo is located (this would break the borg-mount).

The only use case I can imagine is if the commands are used to make the borg executable available in the first place. But I don't know if this is really realistic. Then again, it would be most consistent to run the commands on every action and let the scripts decide when to execute what (based on the $subcommand environment variable that tells them what is being done).

So I can't decide between being consistent and more friendly for weird side-cases (execute always) or being more friendly to most actual use cases (execute only where it makes sense, e.g. not before/after version). I could use some opinions on this.

frederikmoellers avatar Sep 26 '20 02:09 frederikmoellers

Would it be reasonable to consider this an "advanced user feature", with richer support for hooks, and not clutter the gui with toggle buttons that may confuse users who don't need the feature? What I mean is:

  1. Vorta calls the pre and post hook scripts for every operation
  2. However, it calls them with the one of the following arguments: check, create, list, prune
  3. For user-friendliness, maybe provide an example script in the repository that just emits debugging echoes
  4. Given this design, it's also possible to further simplify the UI, because
  5. Vorta could call the hook script with two arguments
    • eg: hook_script.sh pre create and hook_script.sh post create
    • meaning there only needs to be one "hook script" path box.

The reason I think multiple cases should be supported and passed to the script as an argument is because create hooks are useful for things like snapshotting, but snapshot creation and destruction is not necessary or desired for check, list, nor prune. I'm also not sure if Apple has completely walled-off APFS snapshot support and if tmutil (Time Machine infrastructure) is the only way to interact with them. At any rate, that's a tangential issue to this one; although, it introduces the question: what if the source paths don't have stable names? Time Machine snapshots have names like com.apple.TimeMachine.2019-02-23-102421, where 102421 appears to be some kind of transaction ID. Of course, if Apple users don't have the ability to snapshot a consistent database state (eg: all those Photos and Music databases) then we can "Har! Har! Linux is better", but that doesn't seem very nice to me ;-) Anyways, hypothetically, it seems like hook_script.sh pre create could return a path to com.apple.TimeMachine.2019-02-23-102421...but I suspect @m3nu might say that supporting this case is outside the scope of Vorta.

@frederikmoellers

Would it be useful to execute the commands on every action? This would even include running the post-backup-command after "mount", the pre-backup-command before "umount" and both commands around "version". For cases where the commands are used to e.g. mount filesystems, this doesn't make sense. borg --version can run without access to the repos and borg mount should not finish with unmounting the underlying filesystem (in the post-backup-command) where the repo is located (this would break the borg-mount).

The only use case I can imagine is if the commands are used to make the borg executable available in the first place. But I don't know if this is really realistic. Then again, it would be most consistent to run the commands on every action and let the scripts decide when to execute what (based on the $subcommand environment variable that tells them what is being done).

I'm also not sure how often this ability would be used, but it seems like users who installed Vorta from PyPI might use it for things like activating the virtualenvironment.

So I can't decide between being consistent and more friendly for weird side-cases (execute always) or being more friendly to most actual use cases (execute only where it makes sense, e.g. not before/after version). I could use some opinions on this.

The included example script could handle the borg --version pre and post cases using the * case, which should be a noop. Users who want to do something exotic can write a function or copy their code into the * case, or define a version case. To be extra user-friendly I guess the example script could provide a case for borg --version, or may just a comment. There's also the question of support-burden for more cases. For future compatibility reference during upgrades it might also be nice to see a quick-reference list of supported Vorta operations/cases near the top of the file. In a way it's the eternal question of "how much can we expect others to infer meaning" vs "explicit comments and documentation".

Other than that, I wonder about error handling, and what Vorta (and the example script) should do if a single command in the called case fails. Oh, and finally, logging considerations! It seems to me that each command in the example script should be logged by Vorta. @m3nu, what do have to say about design considerations? The only other things I can think of is that the example script should be in POSIX SH and not BASH, and that it should be written knowing that many users will be running it suid root or with setcap (to get a subset of CAP_ADMIN). Oh and there's also the question of what should happen if a user's hook script doesn't have a * case and doesn't handle a case (for whatever reasons). Should it error, warn, or silently noop?

And @frederikmoellers, thank you for working on this! P.S. I was thinking about working on this post-Debian 11 release (probably September'ish), but I'm happy to hear someone else has prioritised it :-)

sten0 avatar Mar 03 '21 22:03 sten0

Wow, that's a lot of (unexpected but very welcome) feedback :) Thanks a lot for giving this so much thought!

I really like the idea of having a simple UI with only one input box and leaving the configuration in the script to be done by advanced users. This gives us all the flexibility we might ever need without (as you say) cluttering the UI for regular users.

Anyways, hypothetically, it seems like hook_script.sh pre create could return a path to com.apple.TimeMachine.2019-02-23-102421

That's an interesting use case. This approach certainly opens the possibility of e.g. pre-create hooks returning information which vorta/borg then uses for the backup. However, implementing this feedback channel seems like an awful lot of work and I'm not sure if there's a larger target audience for this feature. I recommend gathering some feedback before proceeding with this.

I'm also not sure how often this ability would be used, but it seems like users who installed Vorta from PyPI might use it for things like activating the virtualenvironment.

Not sure I understand what you mean. Wouldn't they have to activate the virtualenv before starting vorta in the first place? Or do you mean users who installed borg from PyPI? Anyway, I do agree power users will eventually come up with a use case for this and given your approach (script is always called with 2 parameters; default behaviour is to pass and quit), it really makes no sense to not call it on every operation.

On the question of supported cases/subcommands: I suggest we just pass the subcommand to the script the same way it appears in vorta's (GUI) log (check/list/prune/create/--version). That way things should stay intact even if more subcommands become available. In the script I'd give examples for subcommands at the top and try to provide an extensive list, but always refer to borg and/or the vorta log for absolute certainty.

On error handling: I suggest we handle errors the same way we do now. If any script call returns with a code ≠0, we abort there and cancel the operation. Anything else is too complicated and error-prone imho. Whether the script itself aborts on every error is then left to the user. They can do elaborate error-handling or just abort on the first error as well.

I don't have a strong opinion on logging. Logging everything (stdout and stderr of the script and, consequently, of all programs called within) could clutter the log, but at the same time it would be the most user friendly option imho. So unless anyone disagree's, I'm going to prepare the PR to log everything. After all, users can always redirect to /dev/null in their script :)

I'll try to write a good example using POSIX SH and to make sure that users understand how it's going to be executed and what happens if they remove cases.

And @frederikmoellers, thank you for working on this! P.S. I was thinking about working on this post-Debian 11 release (probably September'ish), but I'm happy to hear someone else has prioritised it :-)

Well what can I say… being directly affected and needing a feature is always the best motivation ;) At the moment the need is gone for me (which is why I haven't put any pressure on finishing the PR) but with your feedback I'll try to get this done now.

frederikmoellers avatar Mar 04 '21 00:03 frederikmoellers

Yeah, good idea to move it into scripts. Should those scripts live in the settings folder or a user-defined path?

One may want different scripts based on the profile? Or a way to temporarily disable them from the UI.

In addition to passing the subcommand as argument, it could pass more details as env vars. E.g. VORTA_PROFILE_NAME, VORTA_REPO_URL, ... Then the docs would have a sample script on how to deal with different subcommands?

case $1 in
   pre_create)
      # do stuff before borg create
      ;;
   post_create)
      # after borg create
      ;;
   pre_prune)
      # before borg prune
      ;;
   *)
     Default condition to be executed
     ;;
esac

m3nu avatar Mar 04 '21 01:03 m3nu

Hi there,

is the thread up to date regarding this feature? I tried to check the documentation but couldn't find any mention of it.

Erwyn avatar Apr 27 '22 08:04 Erwyn

Well the thread is up to date, but unfortunately there's no PR yet. I have to admit I haven't looked into this for quite a while since I stopped doing backups on network mounts, but I will get back to this now. You can expect a PR in a few days and there we can have a final discussion on where to put the scripts and the other open questions.

frederikmoellers avatar Apr 27 '22 13:04 frederikmoellers

The hook names could look like the ones from borgmatic.

Yeah, good idea to move it into scripts. Should those scripts live in the settings folder or a user-defined path?

The user could specify the file path.

One may want different scripts based on the profile? Or a way to temporarily disable them from the UI.

A per profile entry would allow for multiple scripts while not forcing multiple. Some settings regarding scripts could be useful.

In addition to passing the subcommand as argument, it could pass more details as env vars. E.g. VORTA_PROFILE_NAME, VORTA_REPO_URL, ... Then the docs would have a sample script on how to deal with different subcommands?

Isn't that already done for the current scripts? The hook name could also be passed as a env var so that the user can decide whether to pass it as an argument to the script.

real-yfprojects avatar Apr 29 '22 16:04 real-yfprojects

Frederik Möllers @.***> writes:

Wow, that's a lot of (unexpected but very welcome) feedback :) Thanks a lot for giving this so much thought!

Thanks! :-) I'm happy you like this approach. Sorry I missed your reply until now!

Anyways, hypothetically, it seems like hook_script.sh pre create could return a path to com.apple.TimeMachine.2019-02-23-102421

That's an interesting use case. This approach certainly opens the possibility of e.g. pre-create hooks returning information which vorta/borg then uses for the backup. However, implementing this feedback channel seems like an awful lot of work and I'm not sure if there's a larger target audience for this feature. I recommend gathering some feedback before proceeding with this.

I'm assuming Apple's snapshot program has an interface like:

$ make_TimeMachine_snapshot SOURCE
# ... Time Machine makes snapshot, and echoes

snapshot created in com.apple.TimeMachine.2019-02-23-102421

LVM, ZFS, and btrfs can do this. Fedora, RHEL, and SUSE all default to either LVM or btrfs.

I've lost mail due to IMAP IDLE PUSH notifying an email daemon (in this case it was Akonadi) that new mail was available. Akonadi started downloading the mail, and as that was happening my scheduled backup occurred. After experiencing hardware failure the next day I tried to restore my mail from backup and learned that some was missing (database consistency issue). IIRC this is how Apple Mail works too.

The two morals of this story for me were: 1) Don't trust email storage in anything but a maildir, and/or 2) Always quiesce databases, then snapshot the filesystem before making a backup; this insures consistency.

I'm assuming that Time Machine is sane and quiesces databases controlled by Apple software, and then makes an APFS snapshot before making the Time Machine backup.

Vorta should not be inferior to Time Machine in insuring backup consistency state.

I'm also not sure how often this ability would be used, but it seems like users who installed Vorta from PyPI might use it for things like activating the virtualenvironment.

Not sure I understand what you mean. Wouldn't they have to activate the virtualenv before starting vorta in the first place? Or do you mean users who installed borg from PyPI?

Yes, thank you, that's what I meant :-)

Anyway, I do agree power users will eventually come up with a use case for this and given your approach (script is always called with 2 parameters; default behaviour is to pass and quit), it really makes no sense to not call it on every operation.

Agreed.

On the question of supported cases/subcommands: I suggest we just pass the subcommand to the script the same way it appears in vorta's (GUI) log (check/list/prune/create/--version). That way things should stay intact even if more subcommands become available. In the script I'd give examples for subcommands at the top and try to provide an extensive list, but always refer to borg and/or the vorta log for absolute certainty.

Agreed. Also, I wonder if the example script should ideally contain a sort of "hook API" check, because backwards and forwards compatibility are not garanteed, and users could then diff the example script between Vorta versions to see how they need to modify their script.

On error handling: I suggest we handle errors the same way we do now. If any script call returns with a code ≠0, we abort there and cancel the operation. Anything else is too complicated and error-prone imho. Whether the script itself aborts on every error is then left to the user. They can do elaborate error-handling or just abort on the first error as well.

This makes sense for errors within the script. For Borg or Vorta errors, the script needs to be able to cleanup after itself. For this the hook API needs to support "cleanup" an argument to the script.

I don't have a strong opinion on logging. Logging everything (stdout and stderr of the script and, consequently, of all programs called within) could clutter the log, but at the same time it would be the most user friendly option imho. So unless anyone disagree's, I'm going to prepare the PR to log everything. After all, users can always redirect to /dev/null in their script :)

:) Sounds good to me!

I'll try to write a good example using POSIX SH and to make sure that users understand how it's going to be executed and what happens if they remove cases.

Thank you!

sten0 avatar Apr 30 '22 02:04 sten0

In addition to passing the subcommand as argument, it could pass more details as env vars. E.g. VORTA_PROFILE_NAME, VORTA_REPO_URL, ... Then the docs would have a sample script on how to deal with different subcommands?

I recommend avoiding env vars because of the potential issues they could cause with virtualenvs, Flatpaks, and Snaps. I expect that with increasingly secure namespace barriers this will one day definitely break. This method also will need to be dropped when secure sudo/doas support is (hopefully) one day added, because the normal user's env vars must not be exported into the secure superuser (or more limited CAP_ADMIN) environment.

Isn't that already done for the current scripts? The hook name could also be passed as a env var so that the user can decide whether to pass it as an argument to the script.

What do you mean?

sten0 avatar Apr 30 '22 02:04 sten0

What do you mean?

Below the pre- and post-backup command entries vorta states that the following env vars are available: $repo_url, $profile_name, $profile_slug, $returncode.

I recommend avoiding env vars because of the potential issues they could cause with virtualenvs, Flatpaks, and Snaps. I expect that with increasingly secure namespace barriers this will one day definitely break. This method also will need to be dropped when secure sudo/doas support is (hopefully) one day added, because the normal user's env vars must not be exported into the secure superuser (or more limited CAP_ADMIN) environment.

Then we could allow the user to use placeholders in the entry where one defines the script/command.

real-yfprojects avatar Apr 30 '22 06:04 real-yfprojects

Well the thread is up to date, but unfortunately there's no PR yet. I have to admit I haven't looked into this for quite a while since I stopped doing backups on network mounts, but I will get back to this now. You can expect a PR in a few days and there we can have a final discussion on where to put the scripts and the other open questions.

Thanks for your answer, and sorry for the delay, was on vacation. I just wanted to be sure I wasn’t missing anything on my own setup.

thank you very much !

Erwyn avatar May 21 '22 11:05 Erwyn

Summary

supported hooks

  • run before a sequence of borg commands
  • run after a sequence of borg commands
  • before every borg command
  • after every borg command
  • before a specific command
  • after a specific command
  • on error

information for configured scripts

  • archive name
  • repo
  • time?
  • whether backup is scheduled
  • hook
  • sources
  • borg arguments
  • profile
  • borg output?
  • borg returncode

supported actions

  • mount/unmount a drive
  • alter arguments / ENV variables for the following borg command
    • alter sources
    • alter archive name
  • alter PATH / borg executable
  • alter ssh command

proposed GUI elements

  • checkbox to toggle when to run configured scripts
  • another entry for other hooks

superior proposal

  • run script for every hook
  • One entry, one script
  • script decides what to do for the given hook
  • script template or maybe generator?
    • in POSIX SH
    • security considerations with elevated rights
  • option to disable script / all hooks

open questions

  • How should the failure of an "after-borg" hook be handled?
  • information for scripts passed as env variables or arguments?
  • log script contents?
  • restrict script location? - e.g. to Vorta's settings folder
  • Allow specifying cmd arguments for the script?
  • hook API check?
  • what happens on borg error?
    • cleanup for pre script - also when pre script fails
  • use env vars? - security concern in a root context
  • How is information passed to scripts?
  • Empose security restrictions on hooks like strict file permissions?
  • options for how information is passed to script?
  • logging inside the script: stderr, stdout
    • Error messages
    • warnings
    • info
    • debug

real-yfprojects avatar Jan 09 '23 19:01 real-yfprojects

Here is what I thought the GUI should look like: Screenshot from 2024-03-24 00-54-06 Does it look good?

AdwaitSalankar avatar Mar 27 '24 17:03 AdwaitSalankar

What's the difference between Run script before and after all borg commands and Run script before and after every borg command?

real-yfprojects avatar Apr 03 '24 08:04 real-yfprojects

Noticed the same. The UI is clearly not final and has logical issues.

I would look at this thread and see what use case most people want to solve. Then solve that and try to be flexible without making it too complex on the UI or the code. Maybe just 2 textboxes to run before create or all borg commands?

m3nu avatar Apr 07 '24 07:04 m3nu

before and after all borg commands would mean before and after the complete workflow and every borg command would mean before and after each individual borg command in the workflow. As mentioned in the summary, people want to mount and unmount the drive and alter arguments passed to Borg.

AdwaitSalankar avatar Apr 07 '24 18:04 AdwaitSalankar