infrastructure
infrastructure copied to clipboard
Nagios [Epic Task 2]: Tidy Up Existing Nagios Configuration.
As past of issue #2445 we should evaluate the current nagios checks, remove or fix any that are non-functional or no longer required, add any additional checks to address common failures.
Once this is completed, it can be used as the basis of automating the addition of new nagios hosts.
Current Set Of Checks :
1.Network System Time
2.Check Jenkins Agent Connected
3.Current Load
4.Disk Space ROot
5.PING
6.RAM
7.Pending O/S Updates
I've now created this miro board with a much more detailed breakdown of the existing nagios configuration and setup https://miro.com/app/board/uXjVOoWeavg=/#tpicker-content
Currently working through the issues on existing hosts.. status as of today
Host Status Totals | Service Status Totals |
All hosts completed, outstanding 12 warnings are "expected", ie hosts are offline in jenkins, whilst being migrated ( capacity is reduced for some platforms due to the same ) , and the nagios/ansible host discrepancy will be fixed as part of another issue, see Epic: #2445 for details.