pcp icon indicating copy to clipboard operation
pcp copied to clipboard

RFE: Additional PMIE rules

Open myllynen opened this issue 4 years ago • 0 comments

Background: https://github.com/linux-system-roles/linux-system-roles.github.io/issues/47

The default PMIE rules have fairly good coverage but it could be helpful to review whether their cover things like:

  • CPU usage - e.g., detect CPU hogs on non-dedicated systems where no process should utilize CPU for a long time
  • memory usage - e.g., monitor how much memory and swap is used and how much there is swapping in/out activity
  • disk usage - e.g., monitor that no partition is getting full
  • network connectivity - e.g., monitor that gateway, DNS, NTP servers are pingable and no packet loss detected
  • application issues - e.g., generic cases like process segfaulting constantly or a service failing to start
  • security violations - e.g., high amount of failed SSH login attempts, SELinux AVCs, DDoS, or sudo failures
  • hardware failures - e.g., IO errors from storage or current hardware not matching a predefined configuration

Perhaps not all these are equally relevant (or not even possible) but doing a review of and polishing the existing rules while keeping the modern environments in mind could make PMIE more compelling choice for users.

So this is mostly a high level "RFE" to keep this in mind if time permits instead of a request for any one specific rule to be implemented. The above list should not be considered definitive in any sense, the most relevant rules might well be discovered by discussing or studying with various user and support communities.

Thanks.

myllynen avatar Dec 18 '20 14:12 myllynen