opentelemetry-js icon indicating copy to clipboard operation
opentelemetry-js copied to clipboard

Eager exporting for BatchSpanProcessor

Open seemk opened this issue 3 years ago • 7 comments
trafficstars

Batch span processor currently exports batches with a strict interval: 5s interval, 512 batch size as per OTel's default conf (comes to around 100 spans per second for the throughput). However it does not do eager batch exporting. Assume after exporting there are still more than 512 spans in the span queue, with the current logic BSP will wait another 5 seconds instead of trying a new export. If the application generates more than 100 spans per second, the BSP queue will eventually be full and spans will be dropped.

Most SDKs seem to do eager exporting as per my comment here.

Even though the delay and batch size is configurable by environment variables, I think a default throughput of 100 spans per second, while silently dropping all other spans, could be somewhat improved.

  • [ ] This only affects the JavaScript OpenTelemetry library
  • [x] This may affect other libraries, but I would like to get opinions here first

seemk avatar Jul 14 '22 13:07 seemk

Would you suggest eager exporting to be the default or a configurable setting? Is there relevant spec guidance?

dyladan avatar Jul 14 '22 13:07 dyladan

I guess if to strictly read the spec, the current BSP is compliant.

What about if we'd change the current implementation so that if after exporting there are more than OTEL_BSP_MAX_EXPORT_BATCH_SIZE spans in the buffer, another export is started right away? And if there's less than that, the usual timer is started.

seemk avatar Jul 14 '22 13:07 seemk

I made a comment on the spec but there are 2 ways to solve this. 1 is to export multiple batches every x ms. The other is to export a batch as soon as the queue meets or exceeds the max export size.

dyladan avatar Jul 14 '22 14:07 dyladan

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 14 days.

github-actions[bot] avatar Sep 19 '22 06:09 github-actions[bot]

This issue was closed because it has been stale for 14 days with no activity.

github-actions[bot] avatar Oct 10 '22 06:10 github-actions[bot]

I think this should be reopened. The problem is quite serious actually. It's too easy to lose spans when you don't know span rate beforehand. It works as rate-limiter now, not as batcher... @dyladan Both solutions are good. I personally like the second because it allows to keep memory usage on the same level and avoid OOMs What do you think? How this should be implemented? Are changes to the spec required?

morigs avatar Nov 07 '22 14:11 morigs

I agree this is not stale. Thanks for bringing it back up. I also think both solutions are acceptable.

dyladan avatar Nov 07 '22 14:11 dyladan

Hey all, the BSP spec has been updated to specify that it should perform eager exporting https://github.com/open-telemetry/opentelemetry-specification/pull/3024

I see there's already a PR open (https://github.com/open-telemetry/opentelemetry-js/pull/3458) but I'm happy to help on a fix for this if needed, as we've had many issues of spans being silently dropped.

GregLahaye avatar Feb 17 '23 01:02 GregLahaye