TypeError: descriptor 'some_method' for 'A' objects doesn't apply to a 'B' object
Bug report
I found a bug that seems to be code corruption.
While working on an example project with ~70 threads, I occasionally (once every hour or so) get the following exception from various locks:
Exception in thread Sequence 2:
Traceback (most recent call last):
File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 1039, in _bootstrap_inner
self.run()
~~~~~~~~^^
...
File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 656, in wait
with self._cond:
^^^^^^^^^^
File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 304, in __enter__
return self._lock.__enter__()
~~~~~~~~~~~~~~~~~~~~^^
TypeError: descriptor '__exit__' for '_thread.RLock' objects doesn't apply to a '_thread.lock' object
or
Exception in thread Clock:
Traceback (most recent call last):
File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 1039, in _bootstrap_inner
self.run()
~~~~~~~~^^
File "europython.py", line 113, in run
for message in input:
File "/Volumes/RAMDisk/nogil-ep-temp/lib/python3.14/site-packages/mido/ports.py", line 243, in __iter__
yield self.receive()
~~~~~~~~~~~~^^
File "/Volumes/RAMDisk/nogil-ep-temp/lib/python3.14/site-packages/mido/ports.py", line 215, in receive
with self._lock:
^^^^^^^^^^
TypeError: descriptor '__exit__' for '_thread.lock' objects doesn't apply to a '_thread.RLock' object
This looks like it's a bug related to locks but it isn't. It's not even related to descriptors, only descriptors nicely refuse running invalid code.
This issue is also externally reported in https://github.com/PyWavelets/pywt/issues/758 with the same Lock descriptor error message I've seen, and I can reproduce the failure locally, albeit with a different exception:
TypeError: descriptor 'sort' for 'numpy.ndarray' objects doesn't apply to a 'ThreadPoolExecutor' object
To reproduce this with cpython main, do the following:
- make a venv with a free-threaded build of Python
- install Cython from main with
pip install -e . - install Numpy from main with
pip install . --no-build-isolation(important: no-ein this case) - install pywt from main with
pip install -e . --no-build-isolation(important: you DO need-ein this case) - run pytest in a loop (or with autoclave) like this:
PYTHON_GIL=0 pytest pywt/tests/test_concurrent.py
You will need to run this for a longer while to get to a failure.
By doing this, I managed to find this particular failure case:
self = <concurrent.futures.thread.ThreadPoolExecutor object at 0x225be124750>, fn = functools.partial(<function dwtn at 0x225be6d3b40>, wavelet='haar')
args = (array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., ...., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]),)
kwargs = {}, f = <Future at 0x225bea44310 state=finished returned dict>, w = <concurrent.futures.thread._WorkItem object at 0x225bc875bf0>
def submit(self, fn, /, *args, **kwargs):
with self._shutdown_lock, _global_shutdown_lock:
if self._broken:
raise BrokenThreadPool(self._broken)
if self._shutdown:
raise RuntimeError('cannot schedule new futures after shutdown')
if _shutdown:
raise RuntimeError('cannot schedule new futures after '
'interpreter shutdown')
f = _base.Future()
w = _WorkItem(f, fn, args, kwargs)
self._work_queue.put(w)
> self._adjust_thread_count()
E TypeError: Future.set_result() missing 1 required positional argument: 'result'
args = (array([[1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 1., 0., 0., 0., 0., 0., 0., 0., ...., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]]),)
f = <Future at 0x225bea44310 state=finished returned dict>
fn = functools.partial(<function dwtn at 0x225be6d3b40>, wavelet='haar')
kwargs = {}
self = <concurrent.futures.thread.ThreadPoolExecutor object at 0x225be124750>
w = <concurrent.futures.thread._WorkItem object at 0x225bc875bf0>
../installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/concurrent/futures/thread.py:179: TypeError
======================================================================================================== short test summary info ========================================================================================================
FAILED pywt/tests/test_concurrent.py::test_concurrent_dwt - TypeError: Future.set_result() missing 1 required positional argument: 'result'
====================================================================================================== 1 failed, 3 passed in 0.44s ======================================================================================================
-- 863 runs, 862 passes, 1 failure, 734486 msec
Observe how Python wants to call self._adjust_thread_count() (with no arguments) but ends up calling f.set_result(), which causes an exception due to no arguments being passed.
Tested on macOS Sonoma on M1 Max with Python 3.14.0a0 experimental free-threading build (heads/main:7a807c3efaa, Jul 2 2024, 11:58:38).
AFAICT the problem only occurs with the GIL actually disabled.
By the way I hit this last week with 3.13.0b2 so this is likely a problem on the 3.13 branch as well and not something new in 3.14.
Another interesting failure from running the pywt tests that shows this is beyond descriptors (despite the exception):
../installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/concurrent/futures/_base.py:322: in __init__
self._condition = threading.Condition()
self = <[AttributeError("'Future' object has no attribute '_condition'") raised in repr()] Future object at 0x5ddd8796110>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <[AttributeError("'Condition' object has no attribute '_waiters'") raised in repr()] Condition object at 0x5ddd8795790>
lock = <unlocked _thread.RLock object owner=0 count=0 at 0x5ddd881a300>
def __init__(self, lock=None):
if lock is None:
lock = RLock()
self._lock = lock
# Export the lock's acquire() and release() methods
self.acquire = lock.acquire
> self.release = lock.release
E TypeError: descriptor 'result_type' for 'numpy._core._multiarray_umath._array_converter' objects doesn't apply to a '_thread.RLock' object
Look at the failing self reprs. In both cases we're talking about attributes set in __init__ and never unset/deleted.
I have tried to reproduce the above issue with pywt, and I'm routinely seeing errors like:
TypeError: fft() got an unexpected keyword argument 'axes'
Has the NumPy argument parsing been made thread-safe yet?
No, that PR is still in-flight: https://github.com/numpy/numpy/pull/26780
I should probably mention, I'm testing all this with CFLAGS="-g0 -O3" and ./configure --enable-optimizations --with-lto --disable-gil.
Minimal repro so far
This is somewhat frustrating to reproduce without any larger dependencies or codebases. The smallest thing I managed to get to crash is this:
# repro4.py
from concurrent import futures
import os
import random
def calculate(arr):
return sum(arr) / len(arr)
print(os.environ["PYTHONHASHSEED"], end=" ", flush=True)
for _ in range(100):
with futures.ThreadPoolExecutor(max_workers=os.cpu_count()) as ex:
arrs = [[random.random() for _ in range(100)] for _ in range(50)]
results = list(ex.map(calculate, arrs))
Run it with
for seed in (seq -f%1.0f 1000000 1100000)
PYTHONHASHSEED=$seed python repro4.py
end
with the fish shell. On the bash shell (seq -f%1.0f 1000000 1099999 | while read s; do PYTHONHASHSEED=$s python3.14 repro4.py; done) it reproduces much less often.
The hash seed is a red herring but, believe me, if I remove that bit, I can't repro no more.
The result of running this for a longer while is a similar unexpected code execution, one example being:
File "/Users/ambv/Documents/Python/europython/repro4.py", line 13, in <module>
results = list(ex.map(calculate, arrs))
File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/concurrent/futures/_base.py", line 611, in result_iterator
yield _result_or_cancel(fs.pop())
~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/concurrent/futures/_base.py", line 309, in _result_or_cancel
return fut.result(timeout)
~~~~~~~~~~^^^^^^^^^
File "/Volumes/RAMDisk/installed-nogil-main/Library/Frameworks/Python.framework/Versions/3.14/lib/python3.14/threading.py", line 526, in release
if n < 1:
^^^^^
TypeError: '<' not supported between instances of 'NoneType' and 'int'
where the jump from calling concurrent.futures.Future.result(timeout) (where timeout=None) is unexpectedly being executed by threading.Semaphore.release(n=1) that doesn't expect n to ever be None. It's very weird. I managed to run pdb when this situation happens and running fut.result(None) returns the expected outcome. So this must be some momentary corruption.
Thanks, I'm able to reproduce it intermittently now. So far only on my macOS arm64 laptop, not on x86-64 Linux. Reducing MCACHE_SIZE_EXP in pycore_typeobject.h seems to make it happen more quickly -- I changed it to 6 locally.
The bug is in our seq lock implementation:
https://github.com/python/cpython/blob/cb688bab08559079d0ee9ffd841dd6eb11116181/Python/lock.c#L550-L560
https://github.com/python/cpython/blob/cb688bab08559079d0ee9ffd841dd6eb11116181/Objects/typeobject.c#L5390-L5402
The memory ordering on the _PySeqLock_EndRead isn't sufficient. We need an explicit fence (i.e., atomic_thread_fence). The "acquire" prevents subsequent loads from being reordered before it, but in _PySeqLock_EndRead we want to prevent earlier loads from being reordered after the _PySeqLock_EndRead().
The bug occurs on arm64, but not x86-64, because x86-64 enforces ordering of loads relative to each other.
With the PRs merged, is there anything left to do here?
I close the issue.