opentelemetry-python icon indicating copy to clipboard operation
opentelemetry-python copied to clipboard

Fix otlp exporter error handling misusing backoff.expo

Open isra17 opened this issue 3 years ago • 1 comments

Description

This code doesn't work at all. If you try to run the example and end up with an error from the exporter, everything fall-apart due to the backoff code:

Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in Nones.
Exception while exporting Span batch.
Traceback (most recent call last):
  File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 305, in _export
    self._client.Export(
  File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
        status = StatusCode.UNAVAILABLE
        details = "failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused"
        debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2022-09-22T18:58:59.699249-04:00", children:[UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused {created_time:"2022-09-22T18:58:59.699248-04:00", grpc_status:14}]}"
>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 367, in _export_batch
    self.span_exporter.export(self.spans_list[:idx])  # type: ignore
  File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/trace_exporter/__init__.py", line 291, in export
    return self._export(spans)
  File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 345, in _export
    sleep(delay)
TypeError: 'NoneType' object cannot be interpreted as an integer

As it turns out, backoff.expo always yield None first: https://github.com/litl/backoff/blob/master/backoff/_wait_gen.py#L23

Type of change

  • [x] Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Ran on my machine. With this fix the code above now result with:

Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 4s.

Does This PR Require a Contrib Repo Change?

  • [x] No.

Checklist:

  • [ ] Followed the style guidelines of this project
  • [ ] Changelogs have been updated
  • [ ] Unit tests have been added
  • [ ] Documentation has been updated

isra17 avatar Sep 22 '22 23:09 isra17

CLA Signed

The committers listed above are authorized under a signed CLA.

  • :white_check_mark: login: isra17 / name: Israël Hallé (d384a79f5d6ce9f674ad6aa0a4cd189eda728236)

Seems like other PR have more complete fix. Closing!

isra17 avatar Oct 23 '22 18:10 isra17