vision
vision copied to clipboard
Set up disk space monitoring for infrastructure
We need to be alerted when our servers are running out of disk space, so that we can do something about it in good time.
- [ ] SRE will be sent alerts when a server breaks some threshold usage on its root partition.
- [ ] SRE will be sent alerts when a server breaks some threshold usage on any important data partitions.
- [ ] SRE will be sent alerts when a server breaks some threshold inode utilisation on its root partition.
- [ ] SRE will be sent alerts when a server breaks some threshold inode utilisation on any important data partitions.
- [ ] These alerts have been tested by (temporarily) filling disks.
Via @tilgovi: http://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/mon-scripts.html