opentelemetry-dotnet
opentelemetry-dotnet copied to clipboard
traceparent header with 00 traceflags causes a new trace to be started
Bug Report
List of NuGet packages and
version that you are using (e.g. OpenTelemetry 0.4.0-beta.2):
- OpenTelemetry 1.0.1
Runtime version (e.g. net461, net48, netcoreapp2.1, netcoreapp3.1, etc.
You can find this information from the *.csproj file):
- all (tried net48, netcoreapp3.1, net5.0)
Symptom
When an activity is started based on the W3C trace context headers traceparent and tracestate and the traceparent has a trace-flags value of 00 then the initial activity is not sampled, which causes any inner activities to have a null parent and start a new trace, with no trace state.
In OpenTelemetry.Trace.TracerProviderSdk.ComputeActivitySamplingResult the sampler returns SamplingDecision.Drop, which causes ActivitySamplingResult.PropagationData, which causes it to go to PropagateOrIgnoreData, which returns ActivitySamplingResult.None because it's got a trace ID from the trace context. This causes System.Diagnostics.ActivitySource.StartActivity to return null, and any inner activities to have a null parent with no context, which starts a new trace.
If PropagateOrIgnoreData returned ActivitySamplingResult.PropagationData then this would fix the problem. I'm not really sure of the reason why it only propagates when the parent trace id is 0. This seems the wrong way around to me, but maybe there is a reason for it? I would have thought it should propagate if there is any trace id or trace state to propagate.
What is the expected behavior?
Trace context is propagated but not recorded. At the very least, no new trace is started, and the 00 trace flags can be propagated to other services.
What is the actual behavior?
Receiving 00 trace flags causes the trace to be split and recorded as a new trace.
Reproduce
using (var source = new ActivitySource("testsource"))
using (var tracer = Sdk.CreateTracerProviderBuilder()
.AddSource(source.Name)
.SetSampler(new ParentBasedSampler(new AlwaysOnSampler())) // This is just the default sampler. This line can be removed. It is only here for clarity.
.Build())
{
using (var a1 = source.StartActivity("a1", ActivityKind.Server,
ActivityContext.TryParse("00-00000000000000000000000000000001-0000000000000002-01", "foo=bar", out var context) ? context : default(ActivityContext)))
{
Console.WriteLine("A1: {0} {1}", a1?.Id ?? "null", a1?.TraceStateString);
using (var a2 = source.StartActivity("a2"))
{
Console.WriteLine("A2: {0} {1}", a2?.Id ?? "null", a2?.TraceStateString);
}
}
}
when the trace flags are 01 this produces the output:
A1: 00-00000000000000000000000000000001-4fbe03f132eb5e4b-01 foo=bar
A2: 00-00000000000000000000000000000001-474ed4e80a57d74f-01 foo=bar
when the trace flags are 00 e.g. the traceparent is changed to 00-00000000000000000000000000000001-0000000000000002-00, this produces the output
A1: null
A2: 00-12db73b553dd5f4494e90e99e2aa6dc0-0843b253946e054b-01
I would expect this to produce the same output as 01 but with 00 for the trace flags instead.
A1: 00-00000000000000000000000000000001-4fbe03f132eb5e4b-00 foo=bar
A2: 00-00000000000000000000000000000001-474ed4e80a57d74f-00 foo=bar
We are encountering the same issue using the latest versions. Is this considered a bug or expected behaviour?
We also ran into the same issue. samplingDecision.Drop states:
The activity will be created but not recorded.
which is not happening currently. As @ghelyar mentioned, it kinda looks like the behavior of PropagateOrIgnoreData is inverse and should return ActivitySamplingResult.PropagationData to not lose the "not recording" decision of the current trace.
We were encountering the same issue. I had to implement a workaround of passing a duplicate activity context with ActivityTraceFlags set to Recorded to prevent us from generating new traceIds downstream:
var currentContext = Activity.Current?.Context ?? default;
var recordedContext = new ActivityContext(currentContext.TraceId, currentContext.SpanId, ActivityTraceFlags.Recorded, currentContext.TraceState, currentContext.IsRemote);
using var activity = source.StartActivity("activity", AcitivityKind.Internal, recordedContext);
This issue is now fixed in 1.4.x. Note that IsRemote=true for this to work.
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry/CHANGELOG.md#:~:text=TracerProviderSDK%20modified%20for%20spans%20with%20remote%20parent.%20For%20such%20spans%20activity%20will%20be%20created%20irrespective%20of%20SamplingResult%2C%20to%20maintain%20context%20propagation.%20(%233329