opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

[exporterhelper] Make concurrency limit apply per-batch

Open carsonip opened this issue 1 year ago • 7 comments

Description: <Describe what has changed.>

The current issue with concurrency limit in batch sender is that given unfavorable goroutine scheduling, the batch size will be much smaller than MinSizeItems due to the concurrency limit check. This may overwhelm the backend with a lot of small requests, which defeats the purpose of batching.

With this PR, concurrency limit will only apply per-batch, meaning that active batch will be flushed due to concurrency check only if all the goroutines are waiting for the same active batch to be flush. This aims to avoid all the goroutines meaninglessly waiting for new requests to come in, while there will be no spare consumers from the queue to do so.

Compared to the existing implementation, for example, concurrencyLimit = 10, and a group of 5 goroutines are waiting for an export to finish, and 4 goroutines are pending active batch to be ready. If the 10th consumer goroutine calls batch sender send, in the past, it should flush active batch immediately, but with this change it will not. Instead, it will wait for up to FlushTimeout for the other group to return and potentially produce a bigger batch.

This change will have implications on throughput, but I reckon that exporting small requests may also be bad for throughput.

Link to tracking Issue: Fixes #9952

Testing: <Describe what testing was performed and which tests were added.>

Batch sender concurrency limit test is extended to test both merge and merge split. The new tests will most likely fail before this PR.

Documentation: <Describe the documentation added.>

Please delete paragraphs that you did not use before submitting.

carsonip avatar Apr 04 '24 16:04 carsonip

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 92.63%. Comparing base (bb4955e) to head (bc9fd5c). Report is 82 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9891   +/-   ##
=======================================
  Coverage   92.62%   92.63%           
=======================================
  Files         386      386           
  Lines       18236    18238    +2     
=======================================
+ Hits        16892    16894    +2     
  Misses       1000     1000           
  Partials      344      344           

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Apr 04 '24 16:04 codecov[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar May 01 '24 03:05 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar May 16 '24 03:05 github-actions[bot]

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jun 01 '24 03:06 github-actions[bot]

This approach sounds good to me.

Another option is to find a way to synchronize https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/batch_sender.go#L197 to drop the number of active requests by the number of released goroutines. That would be significantly more complicated, though. If you can try this out, that would be great to compare. Let me know

dmitryax avatar Jun 05 '24 03:06 dmitryax

drop the number of active requests by the number of released goroutines. That would be significantly more complicated, though. If you can try this out, that would be great to compare.

Implemented in alternative PR #10337

carsonip avatar Jun 05 '24 21:06 carsonip

This PR was marked stale due to lack of activity. It will be closed in 14 days.

github-actions[bot] avatar Jun 22 '24 03:06 github-actions[bot]

Closed as inactive. Feel free to reopen if this PR is still being worked on.

github-actions[bot] avatar Jul 07 '24 03:07 github-actions[bot]