infrastructure icon indicating copy to clipboard operation
infrastructure copied to clipboard

Nagios [Epic Task 2]: Tidy Up Existing Nagios Configuration.

Open steelhead31 opened this issue 3 years ago • 2 comments

As past of issue #2445 we should evaluate the current nagios checks, remove or fix any that are non-functional or no longer required, add any additional checks to address common failures.

Once this is completed, it can be used as the basis of automating the addition of new nagios hosts.

steelhead31 avatar Jun 28 '22 10:06 steelhead31

Current Set Of Checks : 
       1.Network System Time
       2.Check Jenkins Agent Connected
       3.Current Load
       4.Disk Space ROot
       5.PING
       6.RAM
       7.Pending O/S Updates

steelhead31 avatar Jun 28 '22 10:06 steelhead31

I've now created this miro board with a much more detailed breakdown of the existing nagios configuration and setup https://miro.com/app/board/uXjVOoWeavg=/#tpicker-content

steelhead31 avatar Jul 06 '22 12:07 steelhead31

Currently working through the issues on existing hosts..  status as of today

Host Status Totals

UpDownUnreachablePending
36000
All ProblemsAll Types
036

Service Status Totals

OkWarningUnknownCriticalPending
183362010
All ProblemsAll Types
57240

steelhead31 avatar Sep 28 '22 10:09 steelhead31

All hosts completed, outstanding 12 warnings are "expected", ie hosts are offline in jenkins, whilst being migrated ( capacity is reduced for some platforms due to the same ) , and the nagios/ansible host discrepancy will be fixed as part of another issue, see Epic: #2445 for details.

steelhead31 avatar Sep 29 '22 11:09 steelhead31