opentelemetry-dotnet-contrib Open Telemetry places different http requests under the same Trace

Open Telemetry places different http requests under the same Trace

Open kosportel opened this issue 3 years ago • 1 comments

Hello,

Nuget List:

OpenTelemetry.Exporter.Jaeger Version=1.2.0-rc1
OpenTelemetry.Exporter.OpenTelemetryProtocol Version=1.2.0-rc1
OpenTelemetry.Exporter.Zipkin Version=1.2.0-rc1
OpenTelemetry.Extensions.Hosting Version=1.0.0-rc8
OpenTelemetry.Instrumentation.AspNetCore Version=1.0.0-rc8
OpenTelemetry.Instrumentation.Http Version=1.0.0-rc8
OpenTelemetry.Instrumentation.SqlClient Version=1.0.0-rc8
OpenTelemetry.Instrumentation.StackExchangeRedis Version=1.0.0-rc8

Runtime version

net6.0

Symptom

We have configured a web API (with a single endpoint right now) to use OpenTelemetry libraries in order to capture:

Incomming requests
Outgoing requests
Database (SqlServer) commands
Redis communication

The behavior inside endpoint is the following:

Receive query parameters,
Check if data are in Redis
If Yes, return the value
If No, access database, store value in Redis, return the value

We are exporting the information using Jaeger. Our app is deployed to Kubernetes and Jaeger agent is deployed as side-car. Jaeger is working with ElasticSearch. Jaeger version v 1.28.0.

We noticed that a significant amount of requests are placed under a common trace id. (Please check the attached images)

What is the expected behavior?

We would expect, each request to be under an independent trace id. This is happening in many requests, but not in all. Hard to say which is the majority.

We found a trace with more than 50.000 spans inside. It was running for more than a day and thousant of Requests were placed under a specific TraceId, handled as spans.

What did you expect to see?

Reproduce

I cannot reproduce the problem.

We have the same code in the Development environment, on a different cluster with its own Jaeger infrastructure (Elastic search, same version) but with less resources than production.

Initially i thought that it was a matter of lost traces - since i had noticed that locally while developing another API, with Jaeger-in -memory. I had read some articles where making a refresh on the Jaeger UI, issue is fixed - but this is not our case.

I tried to reproduce the problem, stressing the development environment, without success.

Production has much better resources than development. Production receives about 100 request per second, while I stressed development with more than 5.000.

On the same production cluster we have other Web Apis, configured with Open Telemetry, sending data to same Jaeger infrastructure, without facing same problem.

2022-02-04 12_40_34-Jaeger UI

2022-02-04 19_53_56-Jaeger Error

Feb 04 '22 18:02 kosportel

opentelemetry-dotnet-contrib opentelemetry-dotnet-contrib copied to clipboard

Open Telemetry places different http requests under the same Trace

Symptom

Reproduce

opentelemetry-dotnet-contrib
opentelemetry-dotnet-contrib copied to clipboard