opentelemetry-java icon indicating copy to clipboard operation
opentelemetry-java copied to clipboard

Optimize data loss issues for high-resource single-instance deployments that generate a large number of spans.

Open h0cheung opened this issue 5 months ago • 1 comments

Is your feature request related to a problem? Please describe. Some of our services are deployed as high-resource single instance. They often generate a large number of spans (sustaining up to approximately 15,000 spans per second, with peak bursts of 5,000 to 6,000 spans within 30ms).

The exporter seems to export Spans in only one thread, which limit the through out for sending. Also, it has a constant default value 2048 for queue size, which is too small for high-resource instances.

As a result, many Span were dropped because of full queue, which cause data loss.

Describe the solution you'd like First, send data with multiple consumers. The default number of consumers can depends on the number of CPU cores, maybe half of cores is good.

Also, the default value of queue size should depend on the CPU cores or memory limit of the service.

Describe alternatives you've considered

Additional context

h0cheung avatar Jul 22 '25 08:07 h0cheung

likely related to https://github.com/open-telemetry/opentelemetry-java/issues/4264#issuecomment-1067423289

trask avatar Aug 19 '25 21:08 trask