actions-runner-controller
actions-runner-controller copied to clipboard
Adding support for open telemetry
Is your feature request related to a problem? Please describe. Often we have trouble distinguishing whether root cause of any issue is something with the controller or our own self-hosted infrastructure. For example I'd like to see how long specific methods in the reconcilation loop are taking to run, to rule out any issues.
Describe the solution you'd like Similar to open metrics I'm thinking we can add support for open telemetry.
Describe alternatives you've considered We've enabled the open metrics which has already given us a lot more insight, but I think tracing will take it further.
Additional context
@kwngo Hey! This doesn't sound too bad, but I'm unsure it is worth the effort, or how we should prioritize this over other important issues.
What are the exact use-cases you have in mind? Have you ever encountered any performance problems using ARC, caused by GitHub API slowness, K8s slowness, or ARC's logic bug that makes it very slow to respond, etc? Or is this being suggested only to prepare for such events?
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Until we have a concrete set of requirements for what we want from tracing, we might better hold on until the tracing framework is finalized and added to controller-runtime, so that we don't repeat the effort. https://github.com/kubernetes-sigs/controller-runtime/issues/305
But adding some trace "logs" to some important code paths of ARC would still make sense, regardless of whether we adopt OpenTelemetry tracing. If anyone has any request to add a specific trace log to certain code path in ARC, please raise a feature request for that with your use-case. Thanks 🙏
any ongoing effort? Im interested in working on it :)