opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

export and shutdown timeouts for all OTLP exporters

Open Arnatious opened this issue 5 months ago • 2 comments

Description

This is a solution to several issues related to the current synchronous OTLP exporters.

Currently, OTLP exporters have a couple of pain points

  • #2663 Exporter timeouts are mostly ignored or are unintuitively applied
    • In places where they do apply, they apply to individual attempts and not to retries
  • #3309
    • Exporters persist for a minimum of 63 seconds after program shutdown if a transient error is ongoing
  • #2284
    • Shutdown timeouts are ignored if exports are ongoing, and exporters have no way to cancel ongoing exports when shutdown is requested.
  • Exponential backoff does not include jitter
  • Timeouts are not configurable

This PR implements a new utility class, opentelemetry.exporter.otlp.proto.common.RetryingExporter, that fixes the above issues. It also significantly refactors the existing OTLP exporters to use this, and extracts retry related logic from their test suites.

Attempts were made to maintain the call signature of public APIs, though in several cases **kwargs were added to ensure future proofing, and positional arguments were renamed to create a consistent interface.

OTLP exporters will create a RetryingExporter, passing in a function performing a single export attempt as well as the OTLPExporter's timeout and export result type.

Example

from opentelemetry.exporter.otlp.proto.common import RetryingExporter, RetryableExportError

class OTLPSpanExporter(SpanExporter):
  def __init__(self, ...):
    self._exporter = RetryingExporter(self._export, SpanExportResult, self._timeout)

  def _export(self, timeout_s: float, serialized_data: bytes) -> SpanExportResult:
    result = ...

    if is_retryable(result):
      raise RetryableExportError(result.delay)
    return result

  def export(self, data, timeout_millis = 10_000, **kwargs) -> SpanExportResult:
    return self._exporter.export_with_retry(timeout_millis * 1e-3, data)

  def shutdown(self, timeout_millis = 10_000, **kwargs):
    ...
    self._exporter.shutdown(timeout_millis)
    self._shutdown = True

Fixes #3309

Type of change

Please delete options that are not relevant.

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] New feature (non-breaking change which adds functionality)
  • [x] Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • [x] This change requires a documentation update

How Has This Been Tested?

Tests were added for the RetryableExporter in exporter/opentelemetry-exporter-otlp-proto-common/tests/test_retryable_exporter.py, as well as for the backoff generator in exporter/opentelemetry-exporter-otlp-proto-common/tests/test_backoff.py. Tests were updated throughout the http and grpc otlp exporters, and retry related logic was removed in all cases but for GRPC metrics, which can be split and therefore needed another layer of deadline checking.

Does This PR Require a Contrib Repo Change?

Answer the following question based on these examples of changes that would require a Contrib Repo Change:

  • The OTel specification has changed which prompted this PR to update the method interfaces of opentelemetry-api/ or opentelemetry-sdk/

  • The method interfaces of test/util have changed

  • Scripts in scripts/ that were copied over to the Contrib repo have changed

  • Configuration files that were copied over to the Contrib repo have changed (when consistency between repositories is applicable) such as in

    • pyproject.toml
    • isort.cfg
    • .flake8
  • When a new .github/CODEOWNER is added

  • Major changes to project information, such as in:

    • README.md
    • CONTRIBUTING.md
  • [ ] Yes. - Link to PR:

  • [x] No.

Checklist:

  • [x] Followed the style guidelines of this project
  • [ ] Changelogs have been updated
  • [x] Unit tests have been added
  • [ ] Documentation has been updated

Arnatious avatar Mar 07 '24 23:03 Arnatious