remoting-opentelemetry-plugin
remoting-opentelemetry-plugin copied to clipboard
Distributed Tracing feature
Dependencies
Feature Request
Distributed tracing for Jenkins Remoting.
Purpose
Monitoring and troubleshooting Jenkins agents by tracing the remoting behavior.
Challenges
How to instrument remoting
- Use EngineListener and ChannelListener
attempt PR: https://github.com/jenkinsci/remoting-opentelemetry-plugin/pull/49
What we can trace is restricted
- Modify the remoting module to instrument more
attempt PR: https://github.com/jenkinsci/remoting/pull/471
A completely different method might work well.
- Sniffing packet payload?
How to collect spans when the connection is not established
The easiest way to use EngingListener is to send a listener from the controller and register the listener. But then, we cannot collect spans before the initial connection. Also, we may not be able to collect spans after the connection is closed and before the connection is established again. see https://github.com/jenkinsci/remoting-opentelemetry-plugin/issues/65.
- Setup instrumentation when launching agent.
attempt PR: https://github.com/jenkinsci/remoting/pull/471
How we can contribute to the better monitoring and troubleshooting experience?
OpenTelemetry Plugin already trace the time spent to allocate a node to a job, which includes the time to provision a new node if needed.
We are trying to create more detailed spans but it is difficult to know what kind of spans are helpful for monitoring and troubleshooting.
Here is the draft of the spans: https://docs.google.com/document/d/1gjRamLWz3NwenVifC5pYyBMmxsUjl9MjspZF0mRYeaI/edit#heading=h.6xn68iwvd7gz