opentelemetry-python-contrib
opentelemetry-python-contrib copied to clipboard
Random connection reset errors affecting Celery
Describe your environment
- Centralised AWS OTEL Collector (latest, although same issue with older versions)
- AWS Application Load Balancer fronting the collector
- The collector debug logs have lines such as
loopyWriter exiting with error: transport closed by client - Only affects Celery workers in my stack, Gunicorn and others are unaffected
- Python 3.9
- Current versions of OpenTelemetry
opentelemetry-api==1.24.0
opentelemetry-distro==0.45b0
opentelemetry-exporter-otlp==1.24.0
opentelemetry-exporter-otlp-proto-common==1.24.0
opentelemetry-exporter-otlp-proto-grpc==1.24.0
opentelemetry-exporter-otlp-proto-http==1.24.0
opentelemetry-instrumentation==0.45b0
opentelemetry-instrumentation-botocore==0.45b0
opentelemetry-instrumentation-celery==0.45b0
opentelemetry-instrumentation-dbapi==0.45b0
opentelemetry-instrumentation-django==0.45b0
opentelemetry-instrumentation-logging==0.45b0
opentelemetry-instrumentation-psycopg2==0.45b0
opentelemetry-instrumentation-redis==0.45b0
opentelemetry-instrumentation-requests==0.45b0
opentelemetry-instrumentation-wsgi==0.45b0
opentelemetry-propagator-aws-xray==1.0.1
opentelemetry-proto==1.24.0
opentelemetry-sdk==1.24.0
opentelemetry-sdk-extension-aws==2.0.1
opentelemetry-semantic-conventions==0.45b0
opentelemetry-util-http==0.45b0
Steps to reproduce
Run a task on a Celery worker with opentelemetry-instrument.
What is the expected behavior? No errors reported.
What is the actual behavior? Any task a Celery worker executes results in an HTTP connection reset error or gRPC equivalent, but the traces are still sent successfully.
Additional context I'm not getting these errors on non-Celery processes such as Gunicorn, etc.
It's incredibly challenging to diagnose this issue, so I'm not certain whether it's an issue with my stack or how Celery is handling auto instrumentation.
Anyone else seen this issue?
This is still ongoing sadly. The client Python logs look like Transient error StatusCode.UNAVAILABLE encountered while exporting traces to ....
Any ideas would be greatly appreciated!