SMART Disk Monitoring
Check disks for SMART alert statuses and alert accordingly.
Since I've just had to go through these steps myself with a new main computer (don't ask), I can break this down into multiple steps:
- Install a mail agent for notifications, usually postfix, usually so it sends stuff to Gmail
- Add the mailutils package because we need mail (the program)
- Install smartd through the smartmontools package
- Configure smartd, which has a format that is actively hostile to humans
Step 1 and 2 are pretty much required for any serious system anyway (for ZFS, we need mail for notifications of scrubs through the zed anyway). I'm guessing this is going to be a series of documentation texts, because the step with getting a 2FA from Google is probably going to be hard to automate?
There exists a nice, seemingly-maintained container for Scrutiny. This handles running smartd, allows for easy configuration, and uses shoutrrr for notifications which allows you to easily set up notifications in addition to having a nice webui.
I've created a prototype integration in a branch https://github.com/allthestairs/ansible-nas/tree/scrutiny
This adds a Scrutiny container to the stats role which is also added in that branch. The container needs the SYS_RAWIO capability and read-only access to /run/udev to run smartd.
There are group_vars set up to allow for:
- Configuration of which devices get exposed to Scrutiny (and therefore which ones show up in its webui and notifications).
- Addition of shoutrrr notification URLs without needing to manually edit the included config file templates
If anyone has any thoughts on that, let me know. Things to think about:
- Should this be in stats or in its own role?
- Should there be more configurable parts?
- I set up a persistent data volume allowing it to store a sqlite database of drive temps, should this be the default?
- Should we configure a default notification that would work with any ansible-nas install?
- Other things?