opentelemetry-python backoff.expo v2.0.0 breaking change

Describe your environment pyproject.toml dependencies installed via poetry:

python = "3.8.11"
opentelemetry-instrumentation = "^0.32b0"
opentelemetry-instrumentation-fastapi = "^0.32b0"
opentelemetry-instrumentation-sqlalchemy = "^0.32b0"
opentelemetry-instrumentation-requests = "^0.32b0"
opentelemetry-instrumentation-httpx = "^0.32b0"
opentelemetry-instrumentation-logging = "^0.32b0"
opentelemetry-distro = {extras = ["otlp"], version = "^0.32b0"}
opentelemetry-instrumentation-aiohttp-client = "^0.32b0"
opentelemetry-api = "^1.12.0rc2"

Steps to reproduce

from fastapi import FastAPI, HTTPException, status
import uvicorn
from opentelemetry import metrics, trace
from opentelemetry.sdk.resources import SERVICE_NAME, SERVICE_VERSION, Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.grpc.metric_exporter import OTLPMetricExporter
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.metrics import MeterProvider
from opentelemetry.sdk.metrics.export import PeriodicExportingMetricReader

app = FastAPI(
    title="OTLP backoff dependency test",
    description="OTLP backoff dependency test",
    version="0.0.1",
)

@app.get("/", responses={200: {"description": "ok"}}, include_in_schema=False)
async def root() -> None:
    """Root"""
    return status.HTTP_200_OK, "ok"

resource = Resource(
    attributes={
        SERVICE_NAME: "otlp-backoff-test",
        SERVICE_VERSION: "0.0.1",
    }
)

tracer_provider = TracerProvider(resource=resource)

processor = BatchSpanProcessor(
    OTLPSpanExporter(endpoint="http://localhost:4317/v1/traces"),
)
tracer_provider.add_span_processor(processor)
trace.set_tracer_provider(tracer_provider)

metric_exporter = OTLPMetricExporter(
    endpoint="http://localhost:4317/v1/metrics"
)

meter_reader = PeriodicExportingMetricReader(
    exporter=metric_exporter,
    export_interval_millis=5000,
)

meter_provider = MeterProvider(
    resource=resource,
    metric_readers=[meter_reader],
)
metrics.set_meter_provider(meter_provider)

def main():

    uvicorn.run(
        "main:app",
    )

if __name__ == "__main__":
    main()

What is the expected behavior? Periodic backoff behavior in the logs

Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 1s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 2s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 4s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 8s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 16s.
Transient error StatusCode.UNAVAILABLE encountered while exporting span batch, retrying in 32s.

What is the actual behavior? After each transient error reported in the log, an exception is raised

Traceback (most recent call last):
  File "/venv/lib/python3.8/site-packages/opentelemetry/sdk/metrics/_internal/export/__init__.py", line 412, in _receive_metrics
    self._exporter.export(metrics_data, timeout_millis=timeout_millis)
  File "/venv/lib/python3.8/site-packages/opentelemetry/exporter/otlp/proto/grpc/metric_exporter/__init__.py", line 205, in export
    return self._export(metrics_data)
  File "/venv/lib/python3.8/site-packages/opentelemetry/exporter/otlp/proto/grpc/exporter.py", line 328, in _export
    sleep(delay)
TypeError: an integer is required (got type NoneType)

Additional context I believe the problem is related to changed behavior in backoff 2.0.0. See backoff/_wait_gen.py at https://github.com/litl/backoff/compare/v1.11.1..v2.0.0 . The expo() function now yields once because it's called elsewhere as a coroutine, which returns a None where opentelemetry expects an int. It's not clear to me that expo() is meant for direct usage given that backoff docs show decorators, and the function is in a dunder module. That said, the function is also exported in backoff's __all__, so perhaps they meant to make a breaking change?

Given opentelemetry's wide version range for the dependency (https://github.com/open-telemetry/opentelemetry-python/blob/b9a6358583c0676482a043279c01b252dc6c308e/exporter/opentelemetry-exporter-otlp-proto-grpc/setup.cfg#L48), recent installs will include this newer version of backoff with the breaking change in functionality.

I was able to fix it in my app by pinning backoff = "1.11.1"

Jul 18 '22 02:07 jamesboehmer

Is no one else having this issue? Without the version pin, this completely breaks the grpc for me. As a quick fix, is there a downside to changing the version requirement to < 2.0.0?

Aug 31 '22 08:08 Yamakaky

I am also experiencing this. I just pinned the version and haven't had any issues, but it's not great. I think this library should either handle all of the versions specified or narrow it's version constraints.

Aug 31 '22 15:08 adriangb

Just adding that this is also a show stopper for me.

Sep 06 '22 19:09 agrapsas

Are folks ok with changing the dependency to backoff ~= 2.0 (major version 2 compat)? Based on https://pypi.org/project/backoff/#history, 1.X release of backoff is from July 2021. Anyone still stuck on 1.X for any reason?

Sep 08 '22 16:09 aabmass

@aabmass My case was simple with no dependencies on backoff at all, so pinning to 1.11.1 will be easy to back out, and I'd be supportive of a refactor to make use of ~2.0.

For other folks in this thread - See the comment on @Yamakaky 's PR.

Sep 09 '22 16:09 jamesboehmer

Same here: I pinned 1.11.1 but would be in favor of supporting only ~=2.0, I'll just have to relook my deps which is pretty easy.

Sep 09 '22 16:09 adriangb

I would prefer ~=2.0, since I have other dependencies that need it (schemathesis=~3.17). As with the others, I pinned 1.11.1 to fix the issue.

Sep 09 '22 16:09 jstlaurent

Adding my +1 to fixing the library. This is definitely a widespread problem and is affecting me too.

Sep 29 '22 00:09 cgspeck-aai

The problem here is the direct use of the generator returned by backoff.expo without an initialising .send(None) as done by the code in backoff's "porcelain" APIs (backoff.on_exception and backoff.on_predicate).

This behavior changed in backoff==2.0.0 and the exporter retries are fundamentally broken with backoff>=2.0.0.

Oct 18 '22 19:10 nickstenning

opentelemetry-python opentelemetry-python copied to clipboard

backoff.expo v2.0.0 breaking change

opentelemetry-python
opentelemetry-python copied to clipboard