btrfsmaintenance icon indicating copy to clipboard operation
btrfsmaintenance copied to clipboard

start process to send email notifications upon btrfs problems?

Open testbird opened this issue 5 years ago • 6 comments

I am wondering how btrfs users can ensure to automatically get notified (only) upon errors and warnings (that usually happen in the background and are very likely to remain unnoticed).

Btrfsmaintenance seemed like good find for this, it covers the necessary background tasks.

But couldn't it also run something like a watchdog task (i.e. bash pipe)? It could filter the log for any btrfs warnings and errors that may occur during the daily usage, and send out emails as soon as they occur.

testbird avatar Oct 25 '20 18:10 testbird

Adding some sort of notifications (for example, via email or running a custom script) is a must. Basically, scrubbing is useless if users don't find out in time that there is a data corruption happened, while they most likely still have fresh backups to avoid permanent data loss.

If there are no notifications and users don't monitor logs on a regular basis, they will find about filesystem failure way later, maybe in a few years, when it could already be too late to restore a backup.

Ultranium avatar Oct 16 '24 15:10 Ultranium

Are the messages not in the journal. Why should btrfsmaintenance take over the sending of mail itself? Simply monitor the journal.

eku avatar Oct 16 '24 15:10 eku

I doubt a lot of average users monitor logs regularly, or monitor at all. That's why ZFS has ZED, which can send email notifications if something isn't right (or if everything is alright and you just want to be reminded when scrub or a pool resilver has finished). Having something similar for BTRFS would be great.

Ultranium avatar Oct 16 '24 16:10 Ultranium

mdadm has a similar capability too.

Not everyone has a journal.

Reading the journal (or dmesg directly) is complicated by various issues:

  • how to identify the filesystem from the kernel messages? It's not impossible, but it has a lot of corner cases, particularly with device-mapper aliases, replaced devices, and dropped ratelimited messages. With btrfs dev stats you already know how to identify a filesystem mount point because you had to pass it to btrfs dev stats.
  • kernel messages are ratelimited, so some errors might not be reported, but they are counted in btrfs dev stats.
  • if the system goes down because of the failure, it might come back up without the messages being accessible. Journal files are data blocks, while dev stats are metadata items, and btrfs with the default noflushoncommit mount option will write the metadata blocks before the data blocks. A temporary drive issue that causes a crash might cause the journal update to be lost while the dev stats are retained.
  • it only works if there's one journal. If an error is detected while booting from a live USB stick, it is written to a journal, but not the journal of the machine when it's booted normally. A similar issue arises with removable media that moves from one host to another. btrfs dev stats stores the error counts inside the filesystem, so the counts are always available from the filesystem itself.
  • failures during readahead have possible causes that are unrelated to the device. Readahead operations are generally considered expendable. Some block layers (particularly LVM) will simply fail readahead reads if there's not enough memory available, or the reads are inconvenient to handle correctly. If any process actually requires the reads to be completed then they will later issue non-readahead read requests that won't be dropped.

The last one is a bit complicated.

btrfs dev stats does not count errors that occur during readahead operations. These errors are silently corrected but not recorded in btrfs dev stats (see discussion of that change. Note that the specific bug discussed in the thread was fixed, but the general non-counting of readahead errors remains). For readahead reads that fail because of non-device reasons, this is harmless behavior.

If there's a real device failure that happens during a readahead operation (e.g. a corrupted block is detected on a failing cheap SSD), then dev stats will report no trace of the problem because readahead errors are not counted. With raid1 or dup profiles, btrfs might still self-repair the corrupted block, so the corruption cannot be detected by future reads or scrubs. Data corruption is an important early indicator of SSD failure, so losing some of the detected csum events is a significant problem for a monitoring system. Journal/dmesg monitoring can catch that kind of problem, at the expense of some false positives.

Zygo avatar Oct 16 '24 18:10 Zygo

If there are no notification

AFAIK the btrfsmaintenance scripts run either via cron or systremd.timer. Both send the output via mail to the system administrator. Don't you use these?

eku avatar Oct 17 '24 05:10 eku

If there are no notification

AFAIK the btrfsmaintenance scripts run either via cron or systremd.timer. Both send the output via mail to the system administrator. Don't you use these?

If you are talking about the MAILTO= cron directive, it will send a command output regardless of its exit code, thus cluttering admin's email. In ZFS ZED it's possible to choose sending notifications only if pool is degraded, which is convenient.

BTW, I'm no saying that btrfsmaintenance must copy ZFS, I'm just pointing out how it could be improved.

Ultranium avatar Oct 18 '24 16:10 Ultranium