elemental
elemental copied to clipboard
Instrument with OpenTelemetry
It may be worth adding OpenTelemetry instrumentation to the operator (and cli?) for a few reasons:
- Metrics for things like nodes connected, early disconnects, update issues or plan retries, etc...
- Better debugging when something goes wrong since tracing can highlight dependency interactions in a way that is hard to get from stdout logging
- Learn timing of how our component interacts with other components (kubeapi, rancher, network latency between node and operator, etc...)
- (opinion) We should be adopting this for more projects and this component might be a lower risk proving ground for the value it can bring
Note: I would not expect that we force everyone to setup jaeger or prometheus. But being able to add the right flags to get the data published would be useful when appropriate.
For reference: https://opentelemetry.io/docs/instrumentation/go/getting-started/
Hello @agracey ,
Totally agree with you this is needed, i am starting to work on implementing this in rancher code using the opentelemtry SDK, however i see that opentelemetry packages are already there in go.mod with very old versions and can't find where they are use ?
Also when i tried to update them so i can start implementing the SDK lot of other dependencies pkg start showing errors.
Any idea how we can make clean implementation for this ?
@davidstauffer 👆🏻
Any updates on this?
Hey @krumware 👋🏼 no updates on this. Adding this to the backlog to not get it out of our radar, but there is no plan to work on this feature at present.
Sounds good thanks for the update. It's starting to come up in convos with Prime customers, so just staying out ahead. Thanks!