remoting-opentelemetry-plugin icon indicating copy to clipboard operation
remoting-opentelemetry-plugin copied to clipboard

Distributed Tracing feature

Open Aki-7 opened this issue 3 years ago • 0 comments

Dependencies

Feature Request

Distributed tracing for Jenkins Remoting.

Purpose

Monitoring and troubleshooting Jenkins agents by tracing the remoting behavior.

Challenges

How to instrument remoting

  • Use EngineListener and ChannelListener

attempt PR: https://github.com/jenkinsci/remoting-opentelemetry-plugin/pull/49

What we can trace is restricted

  • Modify the remoting module to instrument more

attempt PR: https://github.com/jenkinsci/remoting/pull/471

A completely different method might work well.

  • Sniffing packet payload?

How to collect spans when the connection is not established

The easiest way to use EngingListener is to send a listener from the controller and register the listener. But then, we cannot collect spans before the initial connection. Also, we may not be able to collect spans after the connection is closed and before the connection is established again. see https://github.com/jenkinsci/remoting-opentelemetry-plugin/issues/65.

  • Setup instrumentation when launching agent.

attempt PR: https://github.com/jenkinsci/remoting/pull/471

How we can contribute to the better monitoring and troubleshooting experience?

OpenTelemetry Plugin already trace the time spent to allocate a node to a job, which includes the time to provision a new node if needed.

We are trying to create more detailed spans but it is difficult to know what kind of spans are helpful for monitoring and troubleshooting.

Here is the draft of the spans: https://docs.google.com/document/d/1gjRamLWz3NwenVifC5pYyBMmxsUjl9MjspZF0mRYeaI/edit#heading=h.6xn68iwvd7gz

Aki-7 avatar Aug 05 '21 06:08 Aki-7