Possible thread-unsafe initialization of wrapped modules?
Consider the following script:
from concurrent.futures import ThreadPoolExecutor
from pkgutil import walk_packages
import scipy
def worker():
for _ in walk_packages(scipy.__path__, scipy.__name__ + '.'):
pass
n_threads=10
tpe = ThreadPoolExecutor(max_workers=min((n_threads, 4)))
futures = [None]*n_threads
for i in range(n_threads):
futures[i] = tpe.submit(worker)
[f.result() for f in futures]
This is based on the scipy test scipy/_lib/tests/test_public_api.py::test_all_modules_are_expected running under pytest-run-parallel.
On both the free-threaded and GIL-enabled interpreter, this script eventually fails with the following error:
Traceback (most recent call last):
File "/Users/goldbaum/Documents/test/test.py", line 16, in <module>
[f.result() for f in futures]
~~~~~~~~^^
File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
~~~~~~~~~~~~~~~~~^^
File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/thread.py", line 59, in run
result = self.fn(*self.args, **self.kwargs)
File "/Users/goldbaum/Documents/test/test.py", line 6, in worker
for _ in walk_packages(scipy.__path__, scipy.__name__ + '.'):
~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
yield from walk_packages(path, info.name+'.', onerror)
File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
yield from walk_packages(path, info.name+'.', onerror)
File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
yield from walk_packages(path, info.name+'.', onerror)
File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 88, in walk_packages
path = getattr(sys.modules[info.name], '__path__', None) or []
~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'scipy._lib.array_api_compat.dask.array'
It runs successfully if I set n_threads=1 in the script.
I think this is happening because there's a race to call clone_module: https://github.com/data-apis/array-api-compat/blob/6c708d13e826fb850161babf34f306fe80cae875/array_api_compat/_internal.py#L56-L71
I think this is happening because sys.modules can change "underneath" a thread - it's even documented to do this: https://docs.python.org/3/library/sys.html#sys.modules, and CPython directly using sys.modules in pkgutil.walk_packages is unsafe for shim modules like the ones provided by array_api_compat. If the shim fails to import because e.g. dask isn't installed, then the shim module might be transiently visible to other threads as available to import. Because the implementation of walk_packages does sys.modules[info.name], if info.name is only transiently available, the dict access might fail. If I modify CPython to do sys.modules.get(info.name, None), then I can avoid this failure mode.
Now I'm not completely sure this is correct, because there is supposed to be a per-module import lock that is held by the interpreter during import - doesn't that protect us here?
there is supposed to be a per-module import lock that is held by the interpreter during import - doesn't that protect us here?
Randomly saw this part and it reminded me of this: https://github.com/python/cpython/issues/91238 where I am still confused if we should add locking in NumPy or CPython should make sure to protect us.
I am still confused if we should add locking in NumPy or CPython should make sure to protect us.
Seconded. I'm confused too. Naively, it feels like CPython should protect us here.