troubleshoot
troubleshoot copied to clipboard
Kubernetes Events Analyzer
Describe the rationale for the suggested feature.
Often times there are key indicators of issues within Kubernetes events. These events surface issues as it relates to scheduling, lack of resources, readiness timeouts. We should have an Events analyzer where you can specify one or all namespaces, match for text and evaluate it in a Warn/Fail/Pass condition similar to textAnalyze - https://troubleshoot.sh/docs/analyze/regex/
Describe the feature
apiVersion: troubleshoot.sh/v1beta2
kind: SupportBundle
metadata:
name: bundle
spec:
collectors:
- clusterResources: {}
analyzers:
- eventsAnalyze:
checkName: "insufficient-cpu"
namespace: my-app
regex: '.*Insufficient cpu.*'
outcomes:
- pass:
when: "false"
message: "Sufficient CPU for all Pods in my-app namespace"
- fail:
when: "true"
message: "Insufficient CPU for some Pods in my-app namespace"
Describe alternatives you've considered
Additional context
I really like this, I wonder if we can make the outcomes even more actionable. Do you see any reasonable way we could tell them which pods the event was attached to for example? The trick here is going to be that events might have to do with more than just pods, but maybe the way events are stored the analyzer could have access to what the event was related to "a node name, a pod name, etc".
I think we likely need to be able to parse at least the Reason and Message fields (either separately or together?), possibly the Type field as well but less relevant (Normal vs Warning etc).
Minor comment: I'd call the analyser events or kubernetesEvents to follow the naming convention of most other analysers
The PR for this feature is about to be merged - however do still need to document the new analyzer so will re-open this issue.