opentelemetry-python
opentelemetry-python copied to clipboard
backoff.expo v2.0.0 breaking change
Describe your environment
pyproject.toml
dependencies installed via poetry:
python = "3.8.11"
opentelemetry-instrumentation = "^0.32b0"
opentelemetry-instrumentation-fastapi = "^0.32b0"
opentelemetry-instrumentation-sqlalchemy = "^0.32b0"
opentelemetry-instrumentation-requests = "^0.32b0"
opentelemetry-instrumentation-httpx = "^0.32b0"
opentelemetry-instrumentation-logging = "^0.32b0"
opentelemetry-distro = {extras = ["otlp"], version = "^0.32b0"}
opentelemetry-instrumentation-aiohttp-client = "^0.32b0"
opentelemetry-api = "^1.12.0rc2"
Steps to reproduce
from fastapi import FastAPI, HTTPException, status
import uvicorn
from opentelemetry import metrics, trace
from opentelemetry.sdk.resources import SERVICE_NAME, SERVICE_VERSION, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader
app = FastAPI(
title="OTLP backoff dependency test",
description="OTLP backoff dependency test",
version="0.0.1",
)
@app.get("/", responses={200: {"description": "ok"}}, include_in_schema=False)
async def root() -> None:
"""Root"""
return status.HTTP_200_OK, "ok"
resource = Resource(
attributes={
SERVICE_NAME: "otlp-backoff-test",
SERVICE_VERSION: "0.0.1",
}
)
tracer_provider = TracerProvider(resource=resource)
processor = BatchSpanProcessor(
OTLPSpanExporter(endpoint="http://localhost:4317/v1/traces"),
)
tracer_provider.add_span_processor(processor)
trace.set_tracer_provider(tracer_provider)
metric_exporter = OTLPMetricExporter(
endpoint="http://localhost:4317/v1/metrics"
)
meter_reader = PeriodicExportingMetricReader(
exporter=metric_exporter,
export_interval_millis=5000,
)
meter_provider = MeterProvider(
resource=resource,
metric_readers=[meter_reader],
)
metrics.set_meter_provider(meter_provider)
def main():
uvicorn.run(
"main:app",
)
if __name__ == "__main__":
main()
What is the expected behavior? Periodic backoff behavior in the logs
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 8s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 16s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 32s.
What is the actual behavior? After each transient error reported in the log, an exception is raised
Traceback (most recent call last):
File "/venv/lib/python3.8/site-packages/opentelemetry/sdk/metrics/_internal/export/__init__.py", line 412, in _receive_metrics
self._exporter.export(metrics_data, timeout_millis=timeout_millis)
File "/venv/lib/python3.8/site-packages/opentelemetry/exporter/otlp/proto/grpc/metric_exporter/__init__.py", line 205, in export
return self._export(metrics_data)
File "/venv/lib/python3.8/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 328, in _export
sleep(delay)
TypeError: an integer is required (got type NoneType)
Additional context
I believe the problem is related to changed behavior in backoff 2.0.0
. See backoff/_wait_gen.py
at https://github.com/litl/backoff/compare/v1.11.1..v2.0.0 . The expo()
function now yield
s once because it's called elsewhere as a coroutine, which returns a None
where opentelemetry
expects an int
. It's not clear to me that expo()
is meant for direct usage given that backoff
docs show decorators, and the function is in a dunder module. That said, the function is also exported in backoff
's __all__
, so perhaps they meant to make a breaking change?
Given opentelemetry
's wide version range for the dependency (https://github.com/open-telemetry/opentelemetry-python/blob/b9a6358583c0676482a043279c01b252dc6c308e/exporter/opentelemetry-exporter-otlp-proto-grpc/setup.cfg#L48), recent installs will include this newer version of backoff with the breaking change in functionality.
I was able to fix it in my app by pinning backoff = "1.11.1"
Is no one else having this issue? Without the version pin, this completely breaks the grpc for me. As a quick fix, is there a downside to changing the version requirement to < 2.0.0
?
I am also experiencing this. I just pinned the version and haven't had any issues, but it's not great. I think this library should either handle all of the versions specified or narrow it's version constraints.
Just adding that this is also a show stopper for me.
Are folks ok with changing the dependency to backoff ~= 2.0
(major version 2 compat)? Based on https://pypi.org/project/backoff/#history, 1.X release of backoff is from July 2021. Anyone still stuck on 1.X for any reason?
@aabmass My case was simple with no dependencies on backoff
at all, so pinning to 1.11.1
will be easy to back out, and I'd be supportive of a refactor to make use of ~2.0
.
For other folks in this thread - See the comment on @Yamakaky 's PR.
Same here: I pinned 1.11.1
but would be in favor of supporting only ~=2.0
, I'll just have to relook my deps which is pretty easy.
I would prefer ~=2.0
, since I have other dependencies that need it (schemathesis=~3.17
). As with the others, I pinned 1.11.1
to fix the issue.
Adding my +1 to fixing the library. This is definitely a widespread problem and is affecting me too.
The problem here is the direct use of the generator returned by backoff.expo
without an initialising .send(None)
as done by the code in backoff
's "porcelain" APIs (backoff.on_exception
and backoff.on_predicate
).
This behavior changed in backoff==2.0.0
and the exporter retries are fundamentally broken with backoff>=2.0.0
.