Make it easier to figure out why borg terminated with warning status
User story
I have used the script from this section of the documentation to create a backup.
At the end, I got an information that
terminating with warning status, rc 1
Unfortunately borg included no information about what the warnings were next to this line of output.
So I opened the log file and searched for "warning". then "Warning". Both only turned up files that happened to have that in the filename, because the logs include every file that is getting inserted into the backups.
So I searched the documentation for "warning" and got a bunch of irrelevant results. Somehow, I ended up finding the "Return codes" section of the documentation, and found this description for return code 1:
generic warning (operation reached its normal end, but there were warnings -- you should check the log, logged as WARNING)
So I grepped my logs for WARNING, and once again found only mentions of files that have WARNING in their filename.
In a fit of desperation, I typed
$ rg -v '^[AM] ' ~/backup.log
and this made it possible for me to find out that
/home/mei/backup.log: file changed while we backed it up
Suggested resolution
There are many things which would have made this a much less infuriating experience. The following come to mind:
-
together with the summary about the size of the archive, time taken and so on, include aggregate statistics on what warnings occurred, e.g.
terminating with warning status, rc 1: 2 files changed while we backed them up
-
make sure to include the severity of the log message in each line of output
-
some kind of facility to put only the more important information in the log files, while having the "current status" type of information go to the terminal only (though this could probably also be built on top of borg within a shell script given a well-defined log output format)
The default log output only includes the messages, but not:
- the timestamp
- the level (like DEBUG, INFO, WARNING, ERROR)
But, as borg uses the logging module, this is configurable, see the docs:
- https://borgbackup.readthedocs.io/en/stable/usage/general.html#logging
- https://github.com/borgbackup/borg/blob/1.4-maint/docs/misc/logging.conf
- https://docs.python.org/3/library/logging.config.html#configuration-file-format
Also, you can of course run borg at the default WARNING log level and keep all other output to the minimum (e.g. not using --list, --progress and not changing the log level by using -v, --info, --debug) - then you'll get ONLY the warnings and errors and it is easier to see what went wrong.
Borg's log output usually goes to stderr, because borg sometimes needs stdout for data streams, so we don't want to have log output on stdout. Using a custom logging configuration you could change that as you like, you could even produce different log files.
Borg 1.4.x got some improvements regarding "modern" return codes, so it uses different ones depending on the specific error or warning that occurred.
Besides tweaking the logging, you could also:
- snapshot the filesystem, so files don't change while borg reads them, or
- exclude unimportant files that are expected to change or make issues
To be clear, what I'm reporting here is an UX issue – once I've gone through all the steps I've described, it's already solved for me: I figured out why my run of borg outputted a warning, and learned that I might want to tweak the log output format or something.
What I'm trying to argue, is that a well-designed program wouldn't lead to me doing something like rg -v '^[AM] ' ~/backup.log in the first place when using it in the default configuration. It's like a missing stair – after you trip over it once, you'll know it's there, and it's not that much of an issue. Maybe someone was even there to warn you about it, so you didn't have to trip over it. But we should still endeavor to actually fix the stair!
you could also [...] snapshot the filesystem, so files don't change while borg reads them
That sounds like a good idea. Is this possible on a storage setup that wasn't designed for this from the outset, such as bare ext4 on a LUKS volume?
Using --filter=CE will (from what I understand from the docs) output separate lines about changed and errored files that occurred during a backup, which afaict is related to some of the specific warning range extended error codes (when the BORG_EXIT_CODES env var is set to modern).
Paired with --log-json one can more easily parse general borg output as JSON objects to filter for relevant info.
I second this! Cost me quiet some time to figure out why my backup didn't terminated properly.