opentelemetry-collector
opentelemetry-collector copied to clipboard
Implement Adaptive Request Concurrency (ARC) for HTTP and gRPC Exporters
Component(s)
exporter/exporterhelper
Is your feature request related to a problem? Please describe.
Currently, configuring OTel exporters (e.g., otlphttp, otlpgrpc, elasticsearch, loki) to be resilient without overwhelming downstream services is a significant challenge. Users must manually tune static concurrency limits, typically sending_queue.num_consumers.
This creates a "vicious loop" for operators:
- Set concurrency too high: The collector can easily overwhelm a downstream service (like Elasticsearch or a custom OTLP receiver), leading to HTTP
429(Too Many Requests) / gRPCRESOURCE_EXHAUSTEDerrors, dropped data, and potential cascading failures. - Set concurrency too low: The collector under-utilizes the downstream service's capacity, leading to wasted resources, increased buffer usage (high memory/disk), and higher end-to-end latency.
This static limit is a "blunt instrument" for a dynamic problem. The optimal concurrency rate is not static; it changes constantly based on:
- The number of collector instances being deployed (e.g., in a Kubernetes HPA).
- The current capacity of the downstream service (e.g., an Elasticsearch cluster scaling up or down).
- The real-time volume of telemetry data being sent.
Operators are forced to "chase the dragon" by constantly re-tuning this static value, or they must provision backends to handle a worst-case scenario that may rarely occur, which is expensive.
Describe the solution you'd like
I propose implementing an Adaptive Request Concurrency (ARC) mechanism within the exporterhelper to support both HTTP and gRPC-based exporters.
This feature would dynamically and automatically adjust the exporter's concurrency level (sending_queue.num_consumers) based on real-time feedback from the downstream service. The mechanism would be inspired by TCP congestion control algorithms (AIMD - Additive Increase, Multiplicative Decrease).
The core logic would be tailored to the protocol:
For HTTP Exporters (e.g., otlphttp, elasticsearch)
- Monitor key signals:
- Round-Trip Time (RTT) of requests. An Exponentially Weighted Moving Average (EWMA) could be used to establish a baseline RTT.
- HTTP Response Codes: Specifically looking for success (
2xx) vs. backpressure signals (429,503, or other5xxerrors).
- Implement AIMD Logic:
- Additive Increase: If RTT is stable or decreasing AND HTTP responses are consistently successful (
2xx), the collector should linearly increase its concurrency limit. - Multiplicative Decrease: If RTT starts to increase significantly (e.g.,
current_rtt > baseline_rtt * rtt_threshold_ratio) OR the exporter receives backpressure signals (429,503), the collector should exponentially decrease its concurrency limit.
- Additive Increase: If RTT is stable or decreasing AND HTTP responses are consistently successful (
For gRPC Exporters (e.g., otlpgrpc)
gRPC (built on HTTP/2) has native flow control for network-level backpressure, but this proposal addresses application-level backpressure (e.g., the receiving server's application logic is overwhelmed). The signals for this are explicit gRPC status codes.
- Monitor key signals:
- gRPC Status Codes: This is the primary signal.
- Success:
OK(Code 0) - Backpressure Signals:
RESOURCE_EXHAUSTED(Code 8, the gRPC equivalent of HTTP 429) andUNAVAILABLE(Code 14, the gRPC equivalent of HTTP 503).
- Success:
- gRPC Status Codes: This is the primary signal.
- Implement AIMD Logic:
- Additive Increase: On consistent
OKresponses, the collector should linearly increase its concurrency limit (the number of concurrent streams, controlled bynum_consumers). - Multiplicative Decrease: On receiving
RESOURCE_EXHAUSTEDorUNAVAILABLEstatus codes, the collector should exponentially decrease its concurrency limit.
- Additive Increase: On consistent
This combined approach creates a feedback loop that automatically "finds" the optimal concurrency level that the downstream service can handle at any given moment, maximizing throughput while ensuring reliability for all major OTLP exporters.
Proposed Configuration
This feature could be added to the sending_queue settings, where it would be leveraged by any exporter using the queue (both gRPC and HTTP).
Example 1: Simple toggle
exporters:
otlphttp:
endpoint: "http://my-backend:4318"
sending_queue:
enabled: true
queue_size: 1000
num_consumers: adaptive # New "adaptive" keyword
Example 2: Detailed configuration block (preferred)
This would allow users to set boundaries and tune the algorithm if needed, while num_consumers would be the static alternative. This single config structure would work for both otlphttp and otlpgrpc.
exporters:
otlphttp:
endpoint: "http://my-backend:4318"
sending_queue:
enabled: true
queue_size: 1000
# num_consumers: 10 # This would be ignored if adaptive_concurrency is enabled
adaptive_concurrency:
enabled: true
min_concurrency: 1 # Optional: The floor for concurrency
max_concurrency: 100 # Optional: The ceiling for concurrency
# Optional: Algorithm tuning parameters with sane defaults
# decrease_ratio: 0.9 # Factor to multiply by on "decrease" signal
# rtt_threshold_ratio: 1.1 # e.g., trigger decrease if RTT > 110% of baseline (HTTP only)
Describe alternatives you've considered
The alternative is the current state: manual, static tuning of num_consumers. This is inefficient, error-prone, and adds significant operational overhead, as described in the problem statement.
Additional context
This proposal is heavily inspired by Vector's "Adaptive Request Concurrency" (ARC) feature, which solves this exact problem for its HTTP-based sinks. Vector's implementation (itself inspired by work done at Netflix) has proven to be extremely effective at improving reliability and performance.
By adopting a similar pattern, the OTel Collector would become a "better infrastructure citizen" out-of-the-box, reducing the tuning burden on users and making OTel-based pipelines more resilient to downstream slowdowns or failures.
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.
Pinging code owners:
- exporter/exporterhelper: @bogdandrutu @dmitryax
See Adding Labels via Comments if you do not have permissions to add labels yourself.
This sounds like a great addition to me, and very much something I've been wanting. Of the two configuration options, I prefer option 2. That would provide a natural escape hatch for tuning if absolutely necessary, e.g. for the EWMA params.
Does this need to be limited to HTTP and gRPC exporters? RTT can be measured at the sending queue level (by the queue consumer). Some HTTP exporters already convert status codes to gRPC codes, so if we standardise on that, then we can just check for gRPC codes at the sending queue level and document that exporters are expected to convert to gRPC codes. That's important for other reasons too, e.g. for propagating 429s back to clients.
Thanks for the great feedback! You've captured the ideal implementation perfectly. Moving this logic into the sending_queue itself, rather than limiting it to specific exporters, is exactly what I was hoping for.
That approach directly achieves the primary goal: any exporter that already uses the sending_queue should get this powerful "out of the box" support with (ideally) no code changes required on their end. That would be a massive win for the ecosystem.
Standardizing on gRPC error codes like RESOURCE_EXHAUSTED at the queue level sounds like a very clean way to get that universal signal for backpressure. As you noted, that has other benefits too.
Ultimately, I'll leave the final decision on the error code standardization to the OTel contributors, but I'm fully supportive of any design that enables this automatic, generic support for all sending_queue users.
@raghu999 would you like to work on contributing this?
@axw Yes, I would be happy to work on contributing this!
What are the next steps? Is there a formal design proposal process I should follow first, or can I begin working on the implementation directly?
@raghu999 great. Let's wait for @dmitryax @bogdandrutu to weigh in first, as they are code owners for exporterhelper.
Hi @axw @raghu999 , may we collaborate on this (after getting some reviews from the code-owners)? I would like to work on this too :)
Hello @axw, @dmitryax, and @bogdandrutu,
I added a PR for the Adaptive Request Concurrency (ARC) feature. I know we had discussed waiting for more feedback on the initial proposal, but I've had time to develop this into a complete implementation that I believe is on par with Vector's approach.
That said, I'm completely open to feedback. If you have different suggestions or another direction in mind, I am happy to look into it and make any changes needed to align with the team's goals.
Just a heads-up: I will be attending KubeCon next week and will be unavailable until November 17th. There's no rush for an immediate review, but I'm looking forward to your thoughts and will be ready to work on any changes as soon as I'm back.
Thanks!