opentelemetry-collector
opentelemetry-collector copied to clipboard
[exporterhelper] Make concurrency limit apply per-batch
Description: <Describe what has changed.>
The current issue with concurrency limit in batch sender is that given unfavorable goroutine scheduling, the batch size will be much smaller than MinSizeItems due to the concurrency limit check. This may overwhelm the backend with a lot of small requests, which defeats the purpose of batching.
With this PR, concurrency limit will only apply per-batch, meaning that active batch will be flushed due to concurrency check only if all the goroutines are waiting for the same active batch to be flush. This aims to avoid all the goroutines meaninglessly waiting for new requests to come in, while there will be no spare consumers from the queue to do so.
Compared to the existing implementation, for example, concurrencyLimit = 10, and a group of 5 goroutines are waiting for an export to finish, and 4 goroutines are pending active batch to be ready. If the 10th consumer goroutine calls batch sender send, in the past, it should flush active batch immediately, but with this change it will not. Instead, it will wait for up to FlushTimeout for the other group to return and potentially produce a bigger batch.
This change will have implications on throughput, but I reckon that exporting small requests may also be bad for throughput.
Link to tracking Issue: Fixes #9952
Testing: <Describe what testing was performed and which tests were added.>
Batch sender concurrency limit test is extended to test both merge and merge split. The new tests will most likely fail before this PR.
Documentation: <Describe the documentation added.>
Please delete paragraphs that you did not use before submitting.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 92.63%. Comparing base (
bb4955e) to head (bc9fd5c). Report is 82 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #9891 +/- ##
=======================================
Coverage 92.62% 92.63%
=======================================
Files 386 386
Lines 18236 18238 +2
=======================================
+ Hits 16892 16894 +2
Misses 1000 1000
Partials 344 344
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This approach sounds good to me.
Another option is to find a way to synchronize https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/batch_sender.go#L197 to drop the number of active requests by the number of released goroutines. That would be significantly more complicated, though. If you can try this out, that would be great to compare. Let me know
drop the number of active requests by the number of released goroutines. That would be significantly more complicated, though. If you can try this out, that would be great to compare.
Implemented in alternative PR #10337
This PR was marked stale due to lack of activity. It will be closed in 14 days.
Closed as inactive. Feel free to reopen if this PR is still being worked on.