troubleshoot
troubleshoot copied to clipboard
Run additional collectors based on collector result
Describe the rationale for the suggested feature.
Sometimes when troubleshooting an issue, the result of one collector can indicate that another one needs to be run based on the output of the first. As an example, if a ceph health detailreturns output for a number of PGs, it would be helpful to have the result of ceph pg $pg query for each $pg in the list of problem pgs.
e.g. output in health.json:
{
"checks": {
"PG_AVAILABILITY": {
"severity": "HEALTH_WARN",
"summary": {
"message": "Reduced data availability: 213 pgs inactive"
},
"detail": [
{
"message": "pg 2.14 is stuck inactive for 1658.491904, current state unknown, last acting []"
},
This would indicate I'd like to collect info for pg 2.14 (etc).
Describe the feature
Adding a collector that is fed info from another collector would be helpful - e.g. take the output saved to ceph/health.json, and parse it. Based on that result, run an additional collector.
This request is to create the framework to allow collectors that follow this pattern, rather than the specific Ceph collector (though that would be a good first example).
Describe alternatives you've considered
- run the collection for all PGs every run. Discounted due to the large amount of PGs in a production environment, that would take a long time to complete as it's a command for every PG.
Additional context
Same applies to analyzers.
A quick example here would be checking k8s versions.
At Percona we test Operators on various k8s flavors and versions.
- GKE 1.20 - 1.23
- EKS 1.20 - 1.22
So if it is GKE, I want to check for versions 1.20 to 1.23. EKS - 1.20 - 1.22 only (no 1.23 here).