troubleshoot Run additional collectors based on collector result

Run additional collectors based on collector result

Open xavpaice opened this issue 3 years ago • 1 comments

Describe the rationale for the suggested feature.

Sometimes when troubleshooting an issue, the result of one collector can indicate that another one needs to be run based on the output of the first. As an example, if a ceph health detailreturns output for a number of PGs, it would be helpful to have the result of ceph pg $pg query for each $pg in the list of problem pgs.

e.g. output in health.json:

{
    "checks": {
        "PG_AVAILABILITY": {
            "severity": "HEALTH_WARN",
            "summary": {
                "message": "Reduced data availability: 213 pgs inactive"
            },
            "detail": [
                {
                    "message": "pg 2.14 is stuck inactive for 1658.491904, current state unknown, last acting []"
                },

This would indicate I'd like to collect info for pg 2.14 (etc).

Describe the feature

Adding a collector that is fed info from another collector would be helpful - e.g. take the output saved to ceph/health.json, and parse it. Based on that result, run an additional collector.

This request is to create the framework to allow collectors that follow this pattern, rather than the specific Ceph collector (though that would be a good first example).

Describe alternatives you've considered

run the collection for all PGs every run. Discounted due to the large amount of PGs in a production environment, that would take a long time to complete as it's a command for every PG.

Additional context

Aug 30 '22 03:08 xavpaice

Same applies to analyzers.

A quick example here would be checking k8s versions.

At Percona we test Operators on various k8s flavors and versions.

GKE 1.20 - 1.23
EKS 1.20 - 1.22

So if it is GKE, I want to check for versions 1.20 to 1.23. EKS - 1.20 - 1.22 only (no 1.23 here).

Nov 21 '22 14:11 spron-in

troubleshoot troubleshoot copied to clipboard

Run additional collectors based on collector result

troubleshoot
troubleshoot copied to clipboard