opentelemetry-collector icon indicating copy to clipboard operation
opentelemetry-collector copied to clipboard

Implement Adaptive Request Concurrency (ARC) for HTTP and gRPC Exporters

Open raghu999 opened this issue 3 weeks ago • 7 comments
trafficstars

Component(s)

exporter/exporterhelper

Is your feature request related to a problem? Please describe.

Currently, configuring OTel exporters (e.g., otlphttp, otlpgrpc, elasticsearch, loki) to be resilient without overwhelming downstream services is a significant challenge. Users must manually tune static concurrency limits, typically sending_queue.num_consumers.

This creates a "vicious loop" for operators:

  • Set concurrency too high: The collector can easily overwhelm a downstream service (like Elasticsearch or a custom OTLP receiver), leading to HTTP 429 (Too Many Requests) / gRPC RESOURCE_EXHAUSTED errors, dropped data, and potential cascading failures.
  • Set concurrency too low: The collector under-utilizes the downstream service's capacity, leading to wasted resources, increased buffer usage (high memory/disk), and higher end-to-end latency.

This static limit is a "blunt instrument" for a dynamic problem. The optimal concurrency rate is not static; it changes constantly based on:

  1. The number of collector instances being deployed (e.g., in a Kubernetes HPA).
  2. The current capacity of the downstream service (e.g., an Elasticsearch cluster scaling up or down).
  3. The real-time volume of telemetry data being sent.

Operators are forced to "chase the dragon" by constantly re-tuning this static value, or they must provision backends to handle a worst-case scenario that may rarely occur, which is expensive.

Describe the solution you'd like

I propose implementing an Adaptive Request Concurrency (ARC) mechanism within the exporterhelper to support both HTTP and gRPC-based exporters.

This feature would dynamically and automatically adjust the exporter's concurrency level (sending_queue.num_consumers) based on real-time feedback from the downstream service. The mechanism would be inspired by TCP congestion control algorithms (AIMD - Additive Increase, Multiplicative Decrease).

The core logic would be tailored to the protocol:


For HTTP Exporters (e.g., otlphttp, elasticsearch)

  • Monitor key signals:
    • Round-Trip Time (RTT) of requests. An Exponentially Weighted Moving Average (EWMA) could be used to establish a baseline RTT.
    • HTTP Response Codes: Specifically looking for success (2xx) vs. backpressure signals (429, 503, or other 5xx errors).
  • Implement AIMD Logic:
    • Additive Increase: If RTT is stable or decreasing AND HTTP responses are consistently successful (2xx), the collector should linearly increase its concurrency limit.
    • Multiplicative Decrease: If RTT starts to increase significantly (e.g., current_rtt > baseline_rtt * rtt_threshold_ratio) OR the exporter receives backpressure signals (429, 503), the collector should exponentially decrease its concurrency limit.

For gRPC Exporters (e.g., otlpgrpc)

gRPC (built on HTTP/2) has native flow control for network-level backpressure, but this proposal addresses application-level backpressure (e.g., the receiving server's application logic is overwhelmed). The signals for this are explicit gRPC status codes.

  • Monitor key signals:
    • gRPC Status Codes: This is the primary signal.
      • Success: OK (Code 0)
      • Backpressure Signals: RESOURCE_EXHAUSTED (Code 8, the gRPC equivalent of HTTP 429) and UNAVAILABLE (Code 14, the gRPC equivalent of HTTP 503).
  • Implement AIMD Logic:
    • Additive Increase: On consistent OK responses, the collector should linearly increase its concurrency limit (the number of concurrent streams, controlled by num_consumers).
    • Multiplicative Decrease: On receiving RESOURCE_EXHAUSTED or UNAVAILABLE status codes, the collector should exponentially decrease its concurrency limit.

This combined approach creates a feedback loop that automatically "finds" the optimal concurrency level that the downstream service can handle at any given moment, maximizing throughput while ensuring reliability for all major OTLP exporters.

Proposed Configuration

This feature could be added to the sending_queue settings, where it would be leveraged by any exporter using the queue (both gRPC and HTTP).

Example 1: Simple toggle

exporters:
  otlphttp:
    endpoint: "http://my-backend:4318"
    sending_queue:
      enabled: true
      queue_size: 1000
      num_consumers: adaptive # New "adaptive" keyword

Example 2: Detailed configuration block (preferred)

This would allow users to set boundaries and tune the algorithm if needed, while num_consumers would be the static alternative. This single config structure would work for both otlphttp and otlpgrpc.

exporters:
  otlphttp:
    endpoint: "http://my-backend:4318"
    sending_queue:
      enabled: true
      queue_size: 1000
      # num_consumers: 10 # This would be ignored if adaptive_concurrency is enabled
      adaptive_concurrency:
        enabled: true
        min_concurrency: 1      # Optional: The floor for concurrency
        max_concurrency: 100    # Optional: The ceiling for concurrency
        # Optional: Algorithm tuning parameters with sane defaults
        # decrease_ratio: 0.9       # Factor to multiply by on "decrease" signal
        # rtt_threshold_ratio: 1.1  # e.g., trigger decrease if RTT > 110% of baseline (HTTP only)

Describe alternatives you've considered

The alternative is the current state: manual, static tuning of num_consumers. This is inefficient, error-prone, and adds significant operational overhead, as described in the problem statement.

Additional context

This proposal is heavily inspired by Vector's "Adaptive Request Concurrency" (ARC) feature, which solves this exact problem for its HTTP-based sinks. Vector's implementation (itself inspired by work done at Netflix) has proven to be extremely effective at improving reliability and performance.

By adopting a similar pattern, the OTel Collector would become a "better infrastructure citizen" out-of-the-box, reducing the tuning burden on users and making OTel-based pipelines more resilient to downstream slowdowns or failures.

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

raghu999 avatar Oct 25 '25 20:10 raghu999

Pinging code owners:

  • exporter/exporterhelper: @bogdandrutu @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions[bot] avatar Oct 25 '25 20:10 github-actions[bot]

This sounds like a great addition to me, and very much something I've been wanting. Of the two configuration options, I prefer option 2. That would provide a natural escape hatch for tuning if absolutely necessary, e.g. for the EWMA params.

Does this need to be limited to HTTP and gRPC exporters? RTT can be measured at the sending queue level (by the queue consumer). Some HTTP exporters already convert status codes to gRPC codes, so if we standardise on that, then we can just check for gRPC codes at the sending queue level and document that exporters are expected to convert to gRPC codes. That's important for other reasons too, e.g. for propagating 429s back to clients.

axw avatar Oct 27 '25 02:10 axw

Thanks for the great feedback! You've captured the ideal implementation perfectly. Moving this logic into the sending_queue itself, rather than limiting it to specific exporters, is exactly what I was hoping for.

That approach directly achieves the primary goal: any exporter that already uses the sending_queue should get this powerful "out of the box" support with (ideally) no code changes required on their end. That would be a massive win for the ecosystem.

Standardizing on gRPC error codes like RESOURCE_EXHAUSTED at the queue level sounds like a very clean way to get that universal signal for backpressure. As you noted, that has other benefits too.

Ultimately, I'll leave the final decision on the error code standardization to the OTel contributors, but I'm fully supportive of any design that enables this automatic, generic support for all sending_queue users.

raghu999 avatar Oct 27 '25 16:10 raghu999

@raghu999 would you like to work on contributing this?

axw avatar Oct 28 '25 00:10 axw

@axw Yes, I would be happy to work on contributing this!

What are the next steps? Is there a formal design proposal process I should follow first, or can I begin working on the implementation directly?

raghu999 avatar Oct 28 '25 04:10 raghu999

@raghu999 great. Let's wait for @dmitryax @bogdandrutu to weigh in first, as they are code owners for exporterhelper.

axw avatar Oct 28 '25 05:10 axw

Hi @axw @raghu999 , may we collaborate on this (after getting some reviews from the code-owners)? I would like to work on this too :)

yaten2302 avatar Oct 28 '25 10:10 yaten2302

Hello @axw, @dmitryax, and @bogdandrutu,

I added a PR for the Adaptive Request Concurrency (ARC) feature. I know we had discussed waiting for more feedback on the initial proposal, but I've had time to develop this into a complete implementation that I believe is on par with Vector's approach.

That said, I'm completely open to feedback. If you have different suggestions or another direction in mind, I am happy to look into it and make any changes needed to align with the team's goals.

Just a heads-up: I will be attending KubeCon next week and will be unavailable until November 17th. There's no rush for an immediate review, but I'm looking forward to your thoughts and will be ready to work on any changes as soon as I'm back.

Thanks!

raghu999 avatar Nov 08 '25 02:11 raghu999