opentelemetry-dotnet-contrib
opentelemetry-dotnet-contrib copied to clipboard
AddXRayTraceIdWithSampler not sampling correctly when using TraceIdRatioBasedSampler
Issue with OpenTelemetry.Contrib.Extensions.AWSXRay
List of all OpenTelemetry NuGet
packages and version that you are
using (e.g. OpenTelemetry 1.0.2
):
- OpenTelemtry 1.3.1
Runtime version (e.g. net462
, net48
, netcoreapp3.1
, net6.0
etc. You can
find this information from the *.csproj
file):
- net6.0
Is this a feature request or a bug?
- [ ] Feature Request
- [X] Bug
What is the expected behavior?
- I cloned https://github.com/open-telemetry/opentelemetry-dotnet-contrib via an IDE comm id 172bdc0c
- In test/OpenTelemetry.Contrib.Extensions.AWSXRay.Tests/TestAWSXRayIdGenerator.cs I added the following test
[Fact]
public void TestGenerateTraceIdForRootNodeUsingActivitySourceWithTraceIdBasedSamplerOnFiftyPercent()
{
using (Sdk.CreateTracerProviderBuilder()
.AddXRayTraceIdWithSampler(new TraceIdRatioBasedSampler(0.5))
.AddSource("TestTraceIdBasedSamplerOn")
.SetSampler(new TraceIdRatioBasedSampler(0.5))
.Build())
{
using (var activitySource = new ActivitySource("TestTraceIdBasedSamplerOn"))
{
using (var activity = activitySource.StartActivity("RootActivity", ActivityKind.Internal))
{
Assert.True(activity.ActivityTraceFlags == ActivityTraceFlags.Recorded);
}
}
}
}
What do you expect to see?
I expect this test to fail 50% of the time I run it.
What is the actual behavior?
The test always fails. When the probability is set above approx 0.77 the test always passes.
However, when I remove the following line:
.AddXRayTraceIdWithSampler(new TraceIdRatioBasedSampler(0.5))
I get the expected behavior.
This leads me to suspect that the Sampler doesnt work with well with the rewritten traceids. I stumbled into this behavior as I wanted to sample my traces for a .NET 6.0 project in an AWS environment using XRay.
Please let me know if you need more information or if I overlooked something here.
I ran into this as well. The TraceId generated to be compatible with XRay has 8 hex digits (4 bytes) of epoch time followed by 24 random hex digits (12 bytes). The current TraceIdRatioBasedSampler
implementation uses the first 8 bytes from the TraceId in the check. There also seems to be an issue with the AddXRayTraceIdWithSampler
method separate from the sampler. Even with a custom sampler it was failing to correctly match expectations for recorded traces. However, I did get promising results by only using the .AddXRayTraceId().SetSampler(new XRayTraceIdRatioBasedSampler(0.5))
I modified the GetLowerLong
method from here to use the last 8 bytes instead of the first 8 bytes of the TraceId span
I seem to be having the same problems. The sampler isn't respected when configured for AWS X-Ray and all requests end up being traced and sent to X-Ray, instead of the subset I expected. @zksward Can you by any chance post your implementation of XRayTraceIdRatioBasedSampler
here? Also, have you noticed this is still a problem for you? Seems to be based on my testing.
@wapplegate Since we're using the custom sampler now I haven't checked to see if this issue still occurs, but I've created a gist with the custom Sampler (it's a direct copy of the XRayTraceIdRatioBasedSampler in the repo with the only modification being on line 55 in the gist to look at the last 8 bytes instead of the first 8 bytes). You can drop it in and use it like this:
services.AddOpenTelemetry().WithTracing(builder =>
{
builder.SetResourceBuilder(ResourceBuilder.CreateDefault()...)
...
.AddXRayTraceId()
.SetSampler(new ParentBasedSampler(new XRayRatioBasedSampler(0.1)))