array-api-compat icon indicating copy to clipboard operation
array-api-compat copied to clipboard

Possible thread-unsafe initialization of wrapped modules?

Open ngoldbaum opened this issue 5 months ago • 3 comments

Consider the following script:

from concurrent.futures import ThreadPoolExecutor
from pkgutil import walk_packages
import scipy

def worker():
    for _ in walk_packages(scipy.__path__, scipy.__name__ + '.'):
        pass

n_threads=10

tpe = ThreadPoolExecutor(max_workers=min((n_threads, 4)))
futures = [None]*n_threads


for i in range(n_threads):
    futures[i] = tpe.submit(worker)

[f.result() for f in futures]

This is based on the scipy test scipy/_lib/tests/test_public_api.py::test_all_modules_are_expected running under pytest-run-parallel.

On both the free-threaded and GIL-enabled interpreter, this script eventually fails with the following error:

Traceback (most recent call last):
  File "/Users/goldbaum/Documents/test/test.py", line 16, in <module>
    [f.result() for f in futures]
     ~~~~~~~~^^
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/_base.py", line 456, in result
    return self.__get_result()
           ~~~~~~~~~~~~~~~~~^^
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/concurrent/futures/thread.py", line 59, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/goldbaum/Documents/test/test.py", line 6, in worker
    for _ in walk_packages(scipy.__path__, scipy.__name__ + '.'):
             ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
    yield from walk_packages(path, info.name+'.', onerror)
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
    yield from walk_packages(path, info.name+'.', onerror)
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 93, in walk_packages
    yield from walk_packages(path, info.name+'.', onerror)
  File "/Users/goldbaum/.pyenv/versions/3.13.4/lib/python3.13/pkgutil.py", line 88, in walk_packages
    path = getattr(sys.modules[info.name], '__path__', None) or []
                   ~~~~~~~~~~~^^^^^^^^^^^
KeyError: 'scipy._lib.array_api_compat.dask.array'

It runs successfully if I set n_threads=1 in the script.

I think this is happening because there's a race to call clone_module: https://github.com/data-apis/array-api-compat/blob/6c708d13e826fb850161babf34f306fe80cae875/array_api_compat/_internal.py#L56-L71

ngoldbaum avatar Jul 07 '25 20:07 ngoldbaum

I think this is happening because sys.modules can change "underneath" a thread - it's even documented to do this: https://docs.python.org/3/library/sys.html#sys.modules, and CPython directly using sys.modules in pkgutil.walk_packages is unsafe for shim modules like the ones provided by array_api_compat. If the shim fails to import because e.g. dask isn't installed, then the shim module might be transiently visible to other threads as available to import. Because the implementation of walk_packages does sys.modules[info.name], if info.name is only transiently available, the dict access might fail. If I modify CPython to do sys.modules.get(info.name, None), then I can avoid this failure mode.

Now I'm not completely sure this is correct, because there is supposed to be a per-module import lock that is held by the interpreter during import - doesn't that protect us here?

ngoldbaum avatar Jul 08 '25 20:07 ngoldbaum

there is supposed to be a per-module import lock that is held by the interpreter during import - doesn't that protect us here?

Randomly saw this part and it reminded me of this: https://github.com/python/cpython/issues/91238 where I am still confused if we should add locking in NumPy or CPython should make sure to protect us.

seberg avatar Jul 09 '25 09:07 seberg

I am still confused if we should add locking in NumPy or CPython should make sure to protect us.

Seconded. I'm confused too. Naively, it feels like CPython should protect us here.

ev-br avatar Oct 28 '25 17:10 ev-br