Restic hides a performance degradation when run as a system service by systemd
Output of restic version
restic 0.16.1-dev (compiled manually) compiled with go1.19.13 on linux/mips64
What backend/service did you use to store the repository?
rclone
Problem description / Steps to reproduce
Restic experiences a significant performance degradation when run as a system service by systemd.
To reproduce, run restic as a system service such as this:
/lib/systemd/system/nightly-backup
[Unit]
Description=Nightly Restic Backup Service
After=network-online.target
Wants=network-online.target
[Service]
LimitNOFILE=infinity
Type=simple
ExecStart=/usr/local/bin/nightly-backup
/usr/local/bin/nightly-backup
#!/bin/bash
export RESTIC_PASSWORD="POORLY_MANAGED_PASSWORD"
/usr/local/bin/restic --quiet -o rclone.program='ssh [email protected] null' -r rclone: backup --exclude="/sys" --exclude="/proc" --exclude="/dev" --exclude="/root" /
Expected behavior
Script execution time should be the same when run as a system service or directly from the command line.
Actual behavior
Script execution time is significantly longer when run as a system service by systemd.
Do you have any idea what may have caused this?
Systemd does not populate the ${HOME} variable for system services. As a result, restic is unable to determine where to place its cache.
This is confirmed in the debug log when the script is run as a system service:
restic/global.go:288 main.Warnf 1 unable to open cache: unable to locate cache directory: neither $XDG_CACHE_HOME nor $HOME are defined
Example script shell execution environment when run as a service:
LANGUAGE=en_GB:en
PWD=/
SYSTEMD_EXEC_PID=27738
LANG=en_GB.UTF-8
INVOCATION_ID=5bf3dbfdc9c34cdba0fb0d7deb5837d4
SHLVL=1
JOURNAL_STREAM=8:113618
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/env
Contrary to what is in the documentation here, restic does not actually exit with an error message. Exiting with a useful error message would be an ideal result that avoids the hidden performance problem.
The obvious resolution is to set ${HOME} or use the --cache-dir variable. Failure to exit with a useful error message makes it hard to know that this is needed. Users with small or rarely changing systems may not even be aware that this problem exists.
Restic indeed only warns about the cache not being possible, instead of exiting, the documentation is wrong.
That concluded, it's generally speaking more on the safe side to continue backing up instead of failing the backup, when the cost is a performance degradation (vs the backups not running at all). From that perspective the current behavior should stay and the documentation be corrected.
One could argue that someone setting up restic to work in an environment without $HOME would notice the backups not working, if restic were to exit instead of just warn about the problem. Changing the behavior to that (exiting instead of just warning) could however result in a very unexpected change, and non-running backups, for people who already have their restic running in environments without $HOME, which wouldn't be great.
I concur that it is better for a backup to run with poor performance than to not run at all. I also agree that changing current behavior could negatively affect existing backups that are currently succeeding.
Since the CLI --help notes that the cache directory defaults to (default: use system default cache directory) and since caching is expected unless the --no-cache argument is provided, would it make sense to make additional attempts to find a suitable cache directory beyond this?
The Filesystem Hierarchy Standard sets aside /var/cache for this purpose, so perhaps /var/cache/restic would be an option for Linux based systems when the root user is the process owner. A further option is to use the home directory associated with the process owner if one happens to be available.