Add instrumentation state information to pods
Hi
when working with auto-instrumentation I found no quick way to verify which pods auto-instrumentation has taken place, applied successfully, ignored or failed. I ended up describing pods and looking for init containers.
Having this information would greatly simplify operations & troubleshooting, if the operator would add state information to pods, when annotating namespace for auto-instrumentation.
For example listing all pods with successful instrumentation:
kubectl get pods --all-namespaces -l otel.auto-instrumentation.state=successful
@svrnm & @pavankrish123
I totally agree.
This is related to old issue https://github.com/open-telemetry/opentelemetry-operator/issues/544
@noMoreCLI is name "otel.auto-instrumentation.state" for label final?
@akhileshsingh85 no, I don't think so. I used this as an illustrative example. As I anticipate that in the future additional state information could be added, I think a structured approach could make sense, but open for comments.
Not an expert on K8s, but from an end user perspective it would be good to have a few kinds of information:
- on which workloads have we tried to apply auto-instrumentation (and maybe which one)? so I can identify if one was skipped or if the wrong auto instrumentation was applied for whatever reasons
- was the auto-instrumentation successful (YES, NO, PENDING, ...)
- if NO, some error details if possible
- if YES, we are good
- if PENDING, there seems to be something blocking, etc.
Would it be better for some of these to use the InstrumentationStatus field, so that the information is in one spot vs applied over (possibly many) workloads?
Correct, the information should be in the status field.
Correct, the information should be in the status field.
As a starting point then, what about just adding status conditions to the crd? https://sdk.operatorframework.io/docs/building-operators/golang/advanced-topics/#manage-cr-status-conditions
I see that CollectorStatus implemented its own fields, but I think at least the conditions @svrnm suggested could be expressed with the upstream type. Willing to help with this
I opened a POC pr at https://github.com/open-telemetry/opentelemetry-operator/pull/1228, happy to discuss other possibilities or suggestions