pcp
                                
                                 pcp copied to clipboard
                                
                                    pcp copied to clipboard
                            
                            
                            
                        RFE: Additional PMIE rules
Background: https://github.com/linux-system-roles/linux-system-roles.github.io/issues/47
The default PMIE rules have fairly good coverage but it could be helpful to review whether their cover things like:
- CPU usage - e.g., detect CPU hogs on non-dedicated systems where no process should utilize CPU for a long time
- memory usage - e.g., monitor how much memory and swap is used and how much there is swapping in/out activity
- disk usage - e.g., monitor that no partition is getting full
- network connectivity - e.g., monitor that gateway, DNS, NTP servers are pingable and no packet loss detected
- application issues - e.g., generic cases like process segfaulting constantly or a service failing to start
- security violations - e.g., high amount of failed SSH login attempts, SELinux AVCs, DDoS, or sudo failures
- hardware failures - e.g., IO errors from storage or current hardware not matching a predefined configuration
Perhaps not all these are equally relevant (or not even possible) but doing a review of and polishing the existing rules while keeping the modern environments in mind could make PMIE more compelling choice for users.
So this is mostly a high level "RFE" to keep this in mind if time permits instead of a request for any one specific rule to be implemented. The above list should not be considered definitive in any sense, the most relevant rules might well be discovered by discussing or studying with various user and support communities.
Thanks.