opentelemetry-python
opentelemetry-python copied to clipboard
Fix otlp exporter error handling misusing backoff.expo
Description
This code doesn't work at all. If you try to run the example and end up with an error from the exporter, everything fall-apart due to the backoff code:
Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in Nones.
Exception while exporting Span batch.
Traceback (most recent call last):
File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 305, in _export
self._client.Export(
File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused"
debug_error_string = "UNKNOWN:Failed to pick subchannel {created_time:"2022-09-22T18:58:59.699249-04:00", children:[UNKNOWN:failed to connect to all addresses; last error: UNKNOWN: Failed to connect to remote host: Connection refused {created_time:"2022-09-22T18:58:59.699248-04:00", grpc_status:14}]}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/opentelemetry/sdk/trace/export/__init__.py", line 367, in _export_batch
self.span_exporter.export(self.spans_list[:idx]) # type: ignore
File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/trace_exporter/__init__.py", line 291, in export
return self._export(spans)
File "/Users/israelhalle/flared/pyro/venv/lib/python3.10/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 345, in _export
sleep(delay)
TypeError: 'NoneType' object cannot be interpreted as an integer
As it turns out, backoff.expo always yield None first: https://github.com/litl/backoff/blob/master/backoff/_wait_gen.py#L23
Type of change
- [x] Bug fix (non-breaking change which fixes an issue)
How Has This Been Tested?
Ran on my machine. With this fix the code above now result with:
Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting traces, retrying in 4s.
Does This PR Require a Contrib Repo Change?
- [x] No.
Checklist:
- [ ] Followed the style guidelines of this project
- [ ] Changelogs have been updated
- [ ] Unit tests have been added
- [ ] Documentation has been updated
The committers listed above are authorized under a signed CLA.
- :white_check_mark: login: isra17 / name: Israël Hallé (d384a79f5d6ce9f674ad6aa0a4cd189eda728236)
Seems like other PR have more complete fix. Closing!