opentelemetry-dotnet-instrumentation
opentelemetry-dotnet-instrumentation copied to clipboard
Ability to trigger frequent thread sampling for selected spans
Feature Request
Describe the solution you'd like
For a selected subset of traces, I'd like to able to capture additional details related to execution (e.g. stack samples) that would give me a deeper visibility, allowing me to better understand the flow of execution and identify potential performance bottlenecks.
Context
Auto-instrumentation already supports continuous profiling which captures stack samples for all of the managed threads.
The expectation of that kind of profiling is to collect samples every few seconds (e.g. every 1s or 10s).
In order to support additional, trace-centric scenarios listed above, auto-instrumentation would need to support collecting samples at a much higher rate (e.g. order of 10s of ms).
Additionally, there should be a way to dynamically start and stop sampling given thread (so that, in case of e.g. async operations, frequent sampling is started on a thread where execution started, and then stopped when e.g. continuation is scheduled, and then started again on a thread where continuation runs).
This also requires an ability to track changes of execution context.
Fortunately, such changes are already tracked by continuous profiling related code, in order to track span context-thread association, so similar approach could be used.
Plugins should be given an option to opt-in for this behavior (and configure frequent sampling frequency, exporter for captured samples etc., similar to current continuous profiling configuration). No overhead should be incurred when feature is not enabled (which should be a default).
Native-side code for continuous profiler could be reused to capture stack samples. Currently, symbols resolving is done when runtime is suspended - in order to minimize suspension time, this could be moved outside of suspension.
- Do you have any proposals about how to manage which traces are selected for this more frequent thread sampling?
- Does this involve registering some hook that is run when the ActivityStarted event is triggered?
- Do we want to limit which things on an Activity can be used to decide which trace should have frequent sampling enabled?
- Do we need to provide a warning that not all information is available when an Activity starts, and some information may not be available until an activity ends, which may be too late?
- Do you have any proposals about how to manage which traces are selected for this more frequent thread sampling?
One of the options would be to store that information in baggage. Spans of a selected trace could additionally be decorated with a custom attribute, for easier identification (e.g. on UI side).
- Does this involve registering some hook that is run when the ActivityStarted event is triggered?
Yes, possibly by adding a custom processor to the TracerProvider created by autoinstrumentation.
- Do we want to limit which things on an Activity can be used to decide which trace should have frequent sampling enabled?
- Do we need to provide a warning that not all information is available when an Activity starts, and some information may not be available until an activity ends, which may be too late?
My current use case would not require information possibly missing (e.g. I would be interested in selecting traces uniformly at random), but stating the limitations sounds like a good idea.
I think that the most of the functionality required could easily be built in plugin (which already have an ability to customize the TracerProvider to add custom processors etc).
Plugin would need autoinstrumentation to provide a way to start/stop sampling given thread, configure sampling interval, export timeout and specify exporter for captured samples. Most of it would be configured similarly to how continuous profiling is configured.
The main question would be how to best expose an ability to start/stop sampling given thread to the plugins. I think initially plugins using reflection to call the methods could work.