opentelemetry-dotnet icon indicating copy to clipboard operation
opentelemetry-dotnet copied to clipboard

[feature request] Limit Maximum Retry Attempts

Open nimanikoo opened this issue 6 months ago • 5 comments

Add support for limiting maximum retry attempts in TryGetRetryResult

While reviewing the retry mechanism implemented in TryGetRetryResult, I came across the following TODO:

// TODO: Consider introducing a fixed max number of retries (e.g. max 5 retries).

Currently, retries continue until the specified deadline is reached. While this follows the spec, it could introduce the following risks:

  • Unbounded retry loops
  • High memory or CPU usage in batch processors (due to prolonged delays)
  • Longer blocking operations in simple processors

I’d like to propose introducing a MaxAttempts threshold (e.g. 5 retries) as a safeguard against excessive retries.

If the team agrees with this direction, I can proceed with the implementation and submit a pull request.

https://github.com/open-telemetry/opentelemetry-dotnet/blob/8c1e63894f0286b1c48b0448a866fdb3bd603fe8/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/Implementation/ExportClient/OtlpRetry.cs

What is the expected behavior?

The retry mechanism will stop after a predefined maximum number of retries (e.g., 5 attempts).

If the maximum retry attempts are reached, the operation will fail without exceeding the deadline, preventing unbounded retries.

Which alternative solutions or features have you considered?

fix

Additional context

No response

nimanikoo avatar May 11 '25 20:05 nimanikoo

There is an unresolved issue from Alan West on clearer specs for exporter retries: https://github.com/open-telemetry/opentelemetry-specification/issues/3639

Three people have commented on that issue recently asking for the number of retry attempts to be configurable for better compatibility with AWS Lambda.

The Java SDK has implemented max retries https://github.com/open-telemetry/opentelemetry-java/blob/v1.50.0/exporters/otlp/testing-internal/src/main/java/io/opentelemetry/exporter/otlp/testing/internal/AbstractHttpTelemetryExporterTest.java#L241

matt-hensley avatar May 19 '25 19:05 matt-hensley

@nimanikoo I support having a maximum number of retries within the specified deadline. The retry implementation is protected with an experimental flag, and this could be a valuable addition. If you have a plan for a PR, please proceed.

rajkumar-rangaraj avatar May 25 '25 19:05 rajkumar-rangaraj

Thank you for your support and feedback. I checked the Java code, and it currently allows for 2 retry attempts within the deadline.

I have some ideas to improve this, and I plan to implement them in a pull request soon. I will also write tests to ensure full coverage of different scenarios and make sure everything works reliably.

I’m excited to contribute and collaborate with you on this project.

Looking forward to working together!

@rajkumar-rangaraj

nimanikoo avatar May 25 '25 20:05 nimanikoo

@matt-hensley Thank you for sharing these valuable references and the link to the related issue. I checked the Java SDK code, and as you pointed out, it performs 2 retry attempts according to their implementation.

This is very helpful for aligning our retry logic. I appreciate your input!

nimanikoo avatar May 25 '25 20:05 nimanikoo