restic icon indicating copy to clipboard operation
restic copied to clipboard

Restic hides a performance degradation when run as a system service by systemd

Open chuckwolber opened this issue 2 years ago • 3 comments

Output of restic version

restic 0.16.1-dev (compiled manually) compiled with go1.19.13 on linux/mips64

What backend/service did you use to store the repository?

rclone

Problem description / Steps to reproduce

Restic experiences a significant performance degradation when run as a system service by systemd.

To reproduce, run restic as a system service such as this:

/lib/systemd/system/nightly-backup

[Unit]
Description=Nightly Restic Backup Service
After=network-online.target
Wants=network-online.target

[Service]
LimitNOFILE=infinity
Type=simple
ExecStart=/usr/local/bin/nightly-backup

/usr/local/bin/nightly-backup

#!/bin/bash

export RESTIC_PASSWORD="POORLY_MANAGED_PASSWORD"
/usr/local/bin/restic --quiet -o rclone.program='ssh [email protected] null' -r rclone: backup --exclude="/sys" --exclude="/proc" --exclude="/dev" --exclude="/root" /

Expected behavior

Script execution time should be the same when run as a system service or directly from the command line.

Actual behavior

Script execution time is significantly longer when run as a system service by systemd.

Do you have any idea what may have caused this?

Systemd does not populate the ${HOME} variable for system services. As a result, restic is unable to determine where to place its cache.

This is confirmed in the debug log when the script is run as a system service:

restic/global.go:288        main.Warnf      1       unable to open cache: unable to locate cache directory: neither $XDG_CACHE_HOME nor $HOME are defined

Example script shell execution environment when run as a service:

LANGUAGE=en_GB:en
PWD=/
SYSTEMD_EXEC_PID=27738
LANG=en_GB.UTF-8
INVOCATION_ID=5bf3dbfdc9c34cdba0fb0d7deb5837d4
SHLVL=1
JOURNAL_STREAM=8:113618
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
_=/usr/bin/env

Contrary to what is in the documentation here, restic does not actually exit with an error message. Exiting with a useful error message would be an ideal result that avoids the hidden performance problem.

The obvious resolution is to set ${HOME} or use the --cache-dir variable. Failure to exit with a useful error message makes it hard to know that this is needed. Users with small or rarely changing systems may not even be aware that this problem exists.

chuckwolber avatar Dec 16 '23 20:12 chuckwolber

Restic indeed only warns about the cache not being possible, instead of exiting, the documentation is wrong.

That concluded, it's generally speaking more on the safe side to continue backing up instead of failing the backup, when the cost is a performance degradation (vs the backups not running at all). From that perspective the current behavior should stay and the documentation be corrected.

One could argue that someone setting up restic to work in an environment without $HOME would notice the backups not working, if restic were to exit instead of just warn about the problem. Changing the behavior to that (exiting instead of just warning) could however result in a very unexpected change, and non-running backups, for people who already have their restic running in environments without $HOME, which wouldn't be great.

rawtaz avatar Dec 17 '23 00:12 rawtaz

I concur that it is better for a backup to run with poor performance than to not run at all. I also agree that changing current behavior could negatively affect existing backups that are currently succeeding.

Since the CLI --help notes that the cache directory defaults to (default: use system default cache directory) and since caching is expected unless the --no-cache argument is provided, would it make sense to make additional attempts to find a suitable cache directory beyond this?

The Filesystem Hierarchy Standard sets aside /var/cache for this purpose, so perhaps /var/cache/restic would be an option for Linux based systems when the root user is the process owner. A further option is to use the home directory associated with the process owner if one happens to be available.

chuckwolber avatar Dec 17 '23 07:12 chuckwolber