ansible-nas icon indicating copy to clipboard operation
ansible-nas copied to clipboard

SMART Disk Monitoring

Open davestephens opened this issue 7 years ago • 2 comments

Check disks for SMART alert statuses and alert accordingly.

davestephens avatar Apr 08 '18 22:04 davestephens

Since I've just had to go through these steps myself with a new main computer (don't ask), I can break this down into multiple steps:

  1. Install a mail agent for notifications, usually postfix, usually so it sends stuff to Gmail
  2. Add the mailutils package because we need mail (the program)
  3. Install smartd through the smartmontools package
  4. Configure smartd, which has a format that is actively hostile to humans

Step 1 and 2 are pretty much required for any serious system anyway (for ZFS, we need mail for notifications of scrubs through the zed anyway). I'm guessing this is going to be a series of documentation texts, because the step with getting a 2FA from Google is probably going to be hard to automate?

scotws avatar May 19 '19 21:05 scotws

There exists a nice, seemingly-maintained container for Scrutiny. This handles running smartd, allows for easy configuration, and uses shoutrrr for notifications which allows you to easily set up notifications in addition to having a nice webui.

I've created a prototype integration in a branch https://github.com/allthestairs/ansible-nas/tree/scrutiny

This adds a Scrutiny container to the stats role which is also added in that branch. The container needs the SYS_RAWIO capability and read-only access to /run/udev to run smartd.

There are group_vars set up to allow for:

  • Configuration of which devices get exposed to Scrutiny (and therefore which ones show up in its webui and notifications).
  • Addition of shoutrrr notification URLs without needing to manually edit the included config file templates

If anyone has any thoughts on that, let me know. Things to think about:

  • Should this be in stats or in its own role?
  • Should there be more configurable parts?
  • I set up a persistent data volume allowing it to store a sqlite database of drive temps, should this be the default?
  • Should we configure a default notification that would work with any ansible-nas install?
  • Other things?

allthestairs avatar Jun 13 '21 19:06 allthestairs