dd-trace-py icon indicating copy to clipboard operation
dd-trace-py copied to clipboard

Unable to start application with Python 3.11.9 + gevent + ddtrace

Open fbexiga opened this issue 1 year ago • 14 comments

Summary of problem

When trying to start a Flask API using gunicorn + gevent + ddtrace + Python 3.11.9, the application crashes. However, if I use Python 3.11.8 instead or remove either gevent or ddtrace, it works. Also, I can only reproduce this issue on a Linux system (like Debian Bookworm), not on MacOS for instance.

Edit: it appears that even after downgrading to Py 3.11.8, with ddtrace 2.7.x the application doesn't start properly, although the error is different. With 2.6.x it does work as expected.

Which version of dd-trace-py are you using?

Tested 2.8.0, 2.7.7 and a few more down to 2.6.3

Which version of pip are you using?

Python 3.11.9 pip 24.0

Which libraries and their versions are you using?

ddtrace==2.7.7 flask==3.0.2 gevent==24.2.1 greenlet==3.0.3 gunicorn==21.2.0

How can we reproduce your problem?

If I try to start a Flask API using gunicorn with gevent workers + ddtrace + Python 3.11.9, i get the following error as soon as the worker boots:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/threading.py", line 989, in _bootstrap
    # Wrapper around the real bootstrap code that ignores
  File "ddtrace/profiling/_threading.pyx", line 38, in ddtrace.profiling._threading.native_id_hook.bootstrap_wrapper
  File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.11/threading.py", line 1049, in _bootstrap_inner
    self._delete()
  File "/usr/local/lib/python3.11/threading.py", line 1081, in _delete
    del _active[get_ident()]
        ~~~~~~~^^^^^^^^^^^^^
KeyError: 139743514440832

What is the result that you get?

I am unable to start the application, getting the error mentioned above.

What is the result that you expected?

I expected the application to start and work just like it does with an older version of Python.

fbexiga avatar Apr 08 '24 23:04 fbexiga

Thanks for reporting this, @fbexiga. If turning off the Profiling functionality is an option for your use case, it's the first thing I'd recommend. Does the error still occur when you set DD_PROFILING_ENABLED=0?

emmettbutler avatar Apr 09 '24 14:04 emmettbutler

cc @sanchda

emmettbutler avatar Apr 09 '24 14:04 emmettbutler

@fbexiga, thank you so much for the thorough and insightful report. Unfortunately, I don't think we have a short-term workaround, but we'll try to get this resolved promptly.

sanchda avatar Apr 09 '24 14:04 sanchda

That's ok, for now we just downgraded back to 3.11.8. No rush or anything, but I thought it was worth reporting.

I tried disabling profiling but still same result.

fbexiga avatar Apr 09 '24 15:04 fbexiga

I have the same error in a Celery application using Python 3.11.9 + gevent + ddtrace

Traceback (most recent call last):
  File "src/gevent/_abstract_linkable.py", line 287, in gevent._gevent_c_abstract_linkable.AbstractLinkable._notify_links
  File "src/gevent/_abstract_linkable.py", line 333, in gevent._gevent_c_abstract_linkable.AbstractLinkable._notify_links
AssertionError: (None, <callback at 0x7fe8acaaa4c0 args=([],)>)
2024-04-09T21:13:55Z <callback at 0x7fe8acaaa4c0 args=([],)> failed with AssertionError

kc-experian avatar Apr 09 '24 21:04 kc-experian

Also affects Python 3.12.3; used to work just fine with 3.12.2.

iherasymenko avatar Apr 10 '24 16:04 iherasymenko

Encountered a similar exception, but we don't have profiling enabled. Also happened when moving from 3.11.8 to 3.11.9. Rolling back python version resolved the error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.11/threading.py", line 1049, in _bootstrap_inner
    self._delete()
  File "/usr/local/lib/python3.11/threading.py", line 1081, in _delete
    del _active[get_ident()]
        ~~~~~~~^^^^^^^^^^^^^
KeyError: 139737853141056

ddtrace==2.7.6 django==4.2.11 gevent==23.9.1 greenlet==3.0.3 gunicorn==21.2.0

askidelskiy avatar Apr 10 '24 17:04 askidelskiy

There is no clear link between this issue and https://github.com/DataDog/dd-trace-py/pull/8870, but it might be worth testing it once it's released 🤞 . Meanwhile we'll see if we can reproduce this issue

P403n1x87 avatar Apr 15 '24 08:04 P403n1x87

Was testing this and found that the crash did not happen when we are on an Intel Processor and crashes on AMD EPYC. Disabling ddtrace prevents it from crashing on AMD EPYC.

Intel processor: Intel(R) Xeon(R) CPU @ 2.20GHz AMD EPYC processor: AMD EPYC 7B12

Docker image = python:3.11.9-slim

ddtrace==2.8.2
flask=3.0.3
gevent==24.2.1
greenlet=3.0.3
gunicorn==22.0.0

Downgrading to python 3.11.8 stops the crash on AMD EPYC.

lawrenceong avatar Apr 24 '24 05:04 lawrenceong

Any movement on this?

fbexiga avatar May 30 '24 08:05 fbexiga

~Reproducible with Python 3.12.4 + gevent 24.2.1 + greenlet 3.0.3 + ddtrace 2.9.0.~

~UPD 1: Only reproducible together with sentry-sdk.~

UPD 2: Reproducible without sentry-sdk. It was a red herring.

iherasymenko avatar Jun 11 '24 03:06 iherasymenko

Also affects Python 3.12.3; used to work just fine with 3.12.2.

I likewise encountered a similar issue when using 3.12.3. Downgrading to 3.12.2 fixed the issue.

JASchilz avatar Jun 11 '24 16:06 JASchilz

I finally have a working reproducer: https://github.com/iherasymenko/ddtrace-8903-reproducer

Chasing it down required a machine with the AMD EPYC 7R13 processor (an AWS EC2 c6a.8xlarge VM) but it seems like the simplified version works fine both on my M3 MacBook Pro and my Intel Core i7 Linux machine.

ddtrace v2.10.0rc2 is still affected by the issue.

Also, in this particular example, disabling patching of mongoengine via DD_PATCH_MODULES="mongoengine:false" helps but this is not really an option as the other enabled integrations will cause the similar effect.

iherasymenko avatar Jun 13 '24 23:06 iherasymenko

I've also been having these issues and noticed a gevent issue showing that it's not compatible with 3.11.9. It further points to a cpython issue about the import of the threading library that happens before gevent has a chance to patch it.

There's a PR open to address this and I've tried the patch locally and I was able to get ddtrace-run & gevent to play nice on 3.11.9 https://github.com/python/cpython/pull/120233

This looks to be an issue strictly with cpython on the latest patch series for 3.11 and 3.12.

EDIT: spelling

ffernand avatar Jun 21 '24 15:06 ffernand

Even though https://github.com/python/cpython/pull/120233 is already merged, it looks like it will not be backported to 3.11 as it is not considered a security fix (https://github.com/python/cpython/pull/120233#issuecomment-2207215913).

It is however, ported to 3.12 / 3.13, so it looks like we will need to upgrade unless there is a plan for gevent to update their code.

lawrenceong avatar Jul 27 '24 03:07 lawrenceong

The issue is fixed in 2.10.0 and 2.9.4 🎉

iherasymenko avatar Jul 29 '24 21:07 iherasymenko

This issue has been automatically closed after a period of inactivity. If it's a feature request, it has been added to the maintainers' internal backlog and will be included in an upcoming round of feature prioritization. Please comment or reopen if you think this issue was closed in error.

github-actions[bot] avatar Dec 27 '24 00:12 github-actions[bot]