Opt-out of "file changed" warnings specifically
Have you checked borgbackup docs, FAQ, and open Github issues?
Yes
Is this a BUG / ISSUE report or a QUESTION?
QUESTION I SUPPOSE
System information. For client/server mode post info for both machines.
Your borg version (borg -V).
borg 1.2.0
Operating system (distribution) and version.
Arch Linux
Hardware / network configuration, and filesystems used.
How much data is handled by borg?
Full borg commandline that lead to the problem (leave away excludes and passwords)
borg create --exclude-caches --one-file-system --exclude /var/tmp --exclude=/n --exclude=/net --exclude=/var/cache/pacman/pkg ssh://borg@[...]/Backup/Servers/star::star.20220226
Describe the problem you're observing.
Borg 1.2.0 in issue #1750 added a "file changed" warning, which results borg to return status 1 ("warnings occured"):
/var/log/syslog: file changed while we backed it up
This is good to have by default, but I don't particularly care about it in my case -- during nightly backups it's going to happen a lot and it's always going to be something like /var/log/syslog -- and would like to ignore it. But there's no option nor environment variable to suppress just this specific warning.
But just as the manual cautions me in the "Logging" section, if I change my backup script to ignore borg's exit status 1 and only fail on exit status 2 ("errors occured"), then I will miss more important warnings, such as this one that I had a few months earlier informing me of sudden filesystem corruption:
/var/lib/[...]org/[...].list: stat: [Errno 117] Structure needs cleaning: '/var/lib/[...]org'
(I think "failed to stat() for other reasons than a disappeared file" should definitely be reported as an error...)
So I'm looking for a way to allow suppressing specifically just the "file changed during backup" messages (i.e. I'd like another of those BORG_[...]_IS_OK environment variables to be added), or otherwise detecting when borg has logged other types of messages.
Should I just redirect borg's output to a file, then grep -v the unwanted messages away?
Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.
Include any warning/errors/backtraces from the system logs
Should I just redirect borg's output to a file, then
grep -vthe unwanted messages away?
I decided I'll just do this for now... (I'll also think about calling fsfreeze before the backup.)
I think what might make more sense than a global setting would be to have a new include/exclude style option which matches the files where we do not care about this warning, e.g. log files or files where we know changes are probably not going to be an issue for the integrity of the backup.
A more general solution might be to add msgids to the messages that should be filtered and add a general option to filter message ids. (This one is orthogonal to @taladar s suggestion, though.)
Looking at the list of messages it seems most of them are hard errors where simply ignoring the error would still have to lead to an aborted run. Ignoring fatal messages doesn't seem very useful, that seems to be mostly something you might want to do with warning or lower level messages.
seems most of them are hard errors
True, with the current set of msgids this would barely make sense.
Hi, I would also like to have some kind of flag for ignoring just these (file changed while we backed it up) warnings, for the same reason @grawity stated in the initial post. I was quite lucky with the previous behavior of just ignoring this case and it would be great to get this back by some kind of command line option or environment variables.
Same here. I use borg with rear and every "file changed" warning lead to a rc 1 which makes rear think that the backup job had aborted.
@mitch-geht-ab this sound like a bug in rear. a warning is a warning and not an error.
@ThomasWaldmann
An exit code of 1 is an error.
@ThomasWaldmann That is indeed the problem, a file changing while being backed up by borg results in borg exiting with a return code of 1. Any script driving borg is then going to report this as an error, even if it's not one.
If borg could exit with a different return code for files changing while being backed up, scripts could be adapted accordingly.
A file changing and being backed up in an inconsistent state should be an error too though, at least for certain files (e.g. database files). For other types (e.g. appending to a log file) it is probably fine.
A file changing and being backed up in an inconsistent state should be an error too though, at least for certain files (e.g. database files). For other types (e.g. appending to a log file) it is probably fine.
The fact that it can go either way is why I think it makes sense for this behavior to be configurable - for users to be able to determine for themselves whether or not a file being changed while being backed up results in a nonzero exit code.
I really don't think that is enough. It should be configurable for which files it does not matter if they change (e.g. log files) and for which files it does, simply ignoring file changes during backup will just result in backups that don't allow you to restore anything useful.
I really don't think that is enough. It should be configurable for which files it does not matter if they change (e.g. log files) and for which files it does, simply ignoring file changes during backup will just result in backups that don't allow you to restore anything useful.
I agree this would be a more robust solution than a global enable/disable.
the borg docs define rc == 1 as warning and rc == 2 as error. scripts and tools which invoke borg have to work like that.
and warning means that someone has to look at it, how severe it is, borg can not know that in many cases.
the borg docs define rc == 1 as warning and rc == 2 as error. scripts and tools which invoke borg have to work like that.
and warning means that someone has to look at it, how severe it is, borg can not know that in many cases.
Yeah, but this thread is about teaching borg how severe the issue is.
Most other warnings that occur at rc==1 level are either rare (file disappeared, permission error) or actually kinda critical (I/O error that should be looked at ASAP), whereas this one is a daily occurrence.
My point is, if I have my backup jobs alert me for rc==1, then I will get emails about "file changed" every night, and after a while I will end up just deleting those and eventually overlooking the serious warnings. But if I don't have my backup jobs alert me for rc==1, then I will definitely miss the serious warnings.
So if borg doesn't know how severe the issue is, why shouldn't I be able to tell it that?
Hi,
I am the maintainer of the borg handler in backupninja. Backupninja is configured to send an email only if there is something useful to report. So no email when everything is OK.
the borg docs define rc == 1 as warning and rc == 2 as error. scripts and tools which invoke borg have to work like that.
and warning means that someone has to look at it, how severe it is, borg can not know that in many cases.
I am using borg daily on 500 servers. Every day, changes happens in some log file during borg backup, which makes email notifications completely useless because nobody will be able to follow them.
I can grep -v to filter out the unwanted lines (No such file or directory and file changed while we backed it up), but then, what should I do with the rest of the output? How will I know if there is another important warning to report? Borg may have been returning 1 for other reason.
That's why it would be very useful to allow the user to tell borg to ignore some warnings (file changed while we backed it up, but also No such file or directory and maybe the warning at first backup).
I see another way, which could be easier to implement in borg. Could you prepend the output with the type of log, INFO, WARNING, ERROR (at least for non-INFO lines)…? Then, in backupninja (and other scripts), we would be able to filter out unwanted warning lines, and still know if we have something to report.
Thank you for all the work, borg is awesome.
borg's logging format is configurable, see the docs. So you can have the log level and other stuff.
i don't see a good solution for the fundamental problem yet.
borg just can't know what's a problematic file change and what not. nor can it know whether "no such file..." is problematic. that's why there is rc == 1 meaning that someone/something else has to decide that.
the best solution is of course when your fs is stable, so nothing changes (snapshotted or inactive fs).
a workaround is to exclude problematic, but unimportant files.
completely switching off a warning of a specific type would be dangerous, because that would silence also such issues in important files.
About "no such file..." errors, I believe that's not up to the backup tool to notify missing files. If the file was important, then something is broken and you should see it by any other mean, tests, monitoring… But, ok, if you make a typo… maybe borg should warn you that /homes does not exit… Maybe.
About "file changed while we backed it up", I agree this is debatable.
- I believe the users should know that they must not depend on borg to backup
/var/lib/mysql. And if they do, they may end up with corrupted backup even if the file has not changed during backup. The job of borg is not to teach that to the users. - But with log files or highly active file servers, this is just so annoying. Excluding problematic files is absolutely not a solution. Even if the file has changed during backup, the file has still been backup. I don't care that it has been modified. That's fine, it will be backed-up again tomorrow.
I would prefer an exclude list for files/directories that I want included in the backup, but that may change during the backup run without the need to trigger a warning.
I think the best solution would be to make the "file changed" warnings happen less frequently.
- Can we get borg to read ahead into a file, so that in more cases it will finish reading a file before any changes happen?
- Does borg check whether the file has changed as soon as it hits EOF, or does it only do that after further processing (compression, dedup, writing to repo) has happened? The former would be much better.
- Can anything else about the speed of reading be improved?
- Add retry support, so when a file has changed, it will retry N times. If one attempt succeeds, then a message is logged, but no warning is raised.
The above overlaps with what is in #6457, but with more detail.
Even if the file has changed during backup, the file has still been backup. I don't care that it has been modified…
Hm. Just to be clear, what is the meaning and implication of "File changed while we backed it up" (Status C)? Does it mean borg could have backed up half of the old file and half of the new file?
@jdchristensen in same order:
- difficult, all data goes through chunker's buffer management (buzhash chunker is Cython/C code). it does some read-ahead already, but only as big as its buffer size (few MBs, IIRC). otoh, we do not want to load huge amounts of file data into memory anyway.
- as processing is per chunk, is it after EOF and after processing content data.
- the
fixedsize chunker chunks faster, but the non-chunking parts of processing are the same, likely not a big difference in practice. - retrying is an option, but would not always help. if a retry is successful, i do not think there should be any message.
@maethor borg 1.2:
- opens a file to get a fd (file descriptor, open file handle)
- does a fstat on that fd to get stats like ctime, mtime, etc.
- then it reads blocks from that file. at some point, it reaches end of the file.
- does another fstat on that fd to get the stats again.
- compares stats to check whether the file has changed while we read it.
different stats can mean that the file data we read are not consistent. borg does not know HOW the file was changed, guess if it was changed in a strict "append only" way, it's no problem at all. in other cases, the backup can be inconsistent (having file contents that never existed like that on disk).
@ThomasWaldmann
- A readahead of 8MB would probably be enough to avoid the problem for many frequently changing log files. Even 32MB would be a minor price to pay. Couldn't a python function feed the data to the chunker? (This could also help with future plans to parallelize some of the code.)
- Can it be changed to check the stat as soon as EOF is reached? This seems like it would help avoid the problem. I'm always surprised by how often borg says that a file is changing, when it should only take a few ms to read the file, and the file doesn't change that often.
- No further comment.
- Even if retrying helps only 90% of the time, it would really cut down on the logs one has to read.
Have we given up on the --dont-warn-on-changed-or-deleted-files-even-though-this-might-lead-to-inconsistent-backups flag solution?
@jdchristensen
- no, because the default buzhash chunker already uses a 8MiB (2^23B) buffer now.
- needs checking in the code, but as soon as a file has multiple chunks, it would not make a big difference.
- N/A
- yes, but i guess we first need a failure rollback - because after we started processing a file, we have put chunks into the repo. if we then do not write the item to the archive, we potentially have orphaned chunks.
that failure / giving up rollback is also needed for e.g. IO error in the middle of a file, we have a ticket about that IIRC.
for the retry mode, it would make sense to:
- not decref / delete chunks while retrying (because propability for same chunks is rather high and also it works faster in the 2nd+ retry, with better chances to not have yet another change in the file while processing it)
- after N retries, give up and decref / delete chunks as needed to not create orphans.
@ThomasWaldmann It's really the combination of the first two items that I think would help, and if the chunker already reads ahead 8MB, then one could consider the first item done (except that one might ask for a slightly larger buffer). But the first item is only a benefit if the second item is done, namely if you check (or at least save) the stat info right when you reach EOF.
I agree that the fourth item is likely tricky to implement and get right, so I understand your hesitancy.
Thanks for you answer @ThomasWaldmann
Now I agree that disabling the warning globally would be a bad idea, and borg don't need to give that option.
But I am also starting to realize that this change makes borg 1.2 kind of unusable for us, and more generally to backup / on servers. Even after ignoring warnings on log files, in 3 days I received 200 emails… prometheus wals, fail2ban sqlite, atop files, rrd files, mailboxes, user uploads, etc. There is a lot of active files on a server.
Do you confirm that Borg 1.1 does not have this problem? And Borg 2 will carry the same behavior than 1.2?
It's not like the problem would not be there if you used borg 1.1.x, just it would not check and therefore would not warn you about this issue. So you might find out at restore time that some files are not internally consistent.
So guess it boils down to: "are these files important?"
- y: you need to get a consistent state of them (e.g. fs snapshot)
- n: do not back them up (or ignore the warnings)
borg2 / master branch currently behaves like 1.2.x.