scikit-build-core
scikit-build-core copied to clipboard
`subprocess.run` stalls indefinitely and consumes all memory when checking ninja version
- Python version:
3.10(installed via conda-forge) - scikit-build-core version:
0.10.7(but issue is present on 0.10.6 as well) - OS: Ubuntu 22.04 (WSL, kernel:
5.15.167.4)
Steps to reproduce:
I am running pip wheel --verbose --verbose --verbose . on my project. The build gets this far:
Created temporary directory: /tmp/pip-build-env-x87al_n8
Created temporary directory: /tmp/pip-standalone-pip-rpekp9xc
Running command /home/gareth/repos/wfenv/bin/python /tmp/pip-standalone-pip-rpekp9xc/__env_pip__.zip/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-x87al_n8/overlay --no-warn-script-location -v --no-binary :none: --only-binary :none: -i https://pypi.org/simple -- 'scikit-build-core @ file:///home/gareth/repos/scikit-build-core' 'cmake>=3.20,<3.31' 'ninja>=1.5'
...
Collecting cmake<3.31,>=3.20
Using cached cmake-3.30.5-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.9 MB)
Collecting ninja>=1.5
Using cached ninja-1.11.1.2-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (422 kB)
...
Getting requirements to build wheel ... done
...
Building wheels for collected packages: wrenfold
Created temporary directory: /tmp/pip-wheel-sqd1lqbx
Destination directory: /tmp/pip-wheel-sqd1lqbx
Running command /home/gareth/repos/wfenv/bin/python /home/gareth/repos/wfenv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpafsk0kce
2024-12-03 21:46:10,392 - scikit_build_core - WARNING - cmake should not be in build-system.requires - scikit-build-core will inject it as needed
2024-12-03 21:46:10,392 - scikit_build_core - WARNING - ninja should not be in build-system.requires - scikit-build-core will inject it as needed
2024-12-03 21:46:10,413 - scikit_build_core - INFO - RUN: /tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages/cmake/data/bin/cmake -E capabilities
2024-12-03 21:46:10,419 - scikit_build_core - INFO - CMake version: 3.30.5
*** scikit-build-core 0.10.7 using CMake 3.30.5 (wheel)
2024-12-03 21:46:10,423 - scikit_build_core - INFO - Build directory: /tmp/tmpb32t54no/build
*** Configuring CMake...
2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - SITE_PACKAGES: /home/gareth/repos/wfenv/lib/python3.10/site-packages
2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - Extra SITE_PACKAGES: /tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages
2024-12-03 21:46:10,427 - scikit_build_core - DEBUG - PATH: ['/home/gareth/repos/wfenv/lib/python3.10/site-packages/pip/_vendor/pep517/in_process', '/tmp/pip-build-env-x87al_n8/site', '/home/gareth/mambaforge/envs/devtools/lib/python310.zip', '/home/gareth/mambaforge/envs/devtools/lib/python3.10', '/home/gareth/mambaforge/envs/devtools/lib/python3.10/lib-dynload', '/tmp/pip-build-env-x87al_n8/overlay/lib/python3.10/site-packages', '/tmp/pip-build-env-x87al_n8/normal/lib/python3.10/site-packages']
2024-12-03 21:46:10,432 - scikit_build_core - DEBUG - Default generator: Ninja
2024-12-03 21:46:10,433 - scikit_build_core - INFO - RUN: /home/gareth/repos/wfenv/bin/ninja --version
The process then stalls, and memory usage grows indefinitely until the process dies. If I kill the process, it appears to stop while reading stdout inside subprocess. I realize this context is a little thin at the moment, but I am still trying to gather debugging information.
Running the command /home/gareth/repos/wfenv/bin/ninja --version manually has no issues. It prints 1.11.1.git.kitware.jobserver-1 and exits.
One (possibly tangential) question I have is: Why does scikit-build-core query the instance of ninja present in my virtual environment /home/gareth/repos/wfenv/bin/ninja (see INFO print above), rather than the version that is collected by pip wheel in the build overlay. Is this expected?
Notably, if I uninstall the instance of ninja in wfenv, the build proceeds normally:
2024-12-03 22:12:27,909 - scikit_build_core - DEBUG - Default generator: Ninja
2024-12-03 22:12:27,910 - scikit_build_core - INFO - RUN: ninja --version
2024-12-03 22:12:27,911 - scikit_build_core - INFO - Ninja version: 1.11.1
2024-12-03 22:12:27,911 - scikit_build_core - DEBUG - CMAKE_GENERATOR: Using ninja: ninja
I instrumented my CMake to check the path to ninja and found:
-- CMAKE_MAKE_PROGRAM: ninja
-- Path to make program: /tmp/pip-build-env-aducxdi1/overlay/bin/ninja
Which appears to be correct - it is using the overlay version.
Of course, I can remove any stray instances of ninja in my virtual environment - but it is somewhat concerning that finding the wrong one triggers a lock-up followed by OOM, so I would like to understand this issue a bit better.
From navigating the code I guess you are hitting https://github.com/scikit-build/scikit-build-core/blob/3943920fa267dc83f9295279bea1c774c0916f13/src/scikit_build_core/program_search.py#L137 And then it fails further down the line when it tries to match ninja version specification.
When you run
$ ninja --version
$ echo $?
Do you get a non-zero exit value, because it would put it in that branch
Do you get a non-zero exit value, because it would put it in that branch
It exits normally with return code 0. If I kill the stalled process with ctrl-C, it seems like it never escapes out of the call to Run().capture(ninja_path, "--version").
I am not really familiar with the expected behavior here - it feels like scikit-build-core invoking the existing ninja install in my venv is incorrect, and rather it should use the version installed in the build overlay.
It should try the local one first. If it’s installed in the build env, you should not be able to get past it (unless it was broken).
Though the outer one should be broken either. Forcing a pip version is not recommended, as some platforms do not have wheels, like BSDs. Will have to investigate, hopefully later today.
I encountered exactly the same problem. What saved me in the end was to uninstall Ninja with pip uninstall ninja.
Hope this helps
Anything unusual about your setup that I could reproduce? I've tried to reproduce this, but haven't been able to. I've tried something like this:
docker run --rm -it ubuntu:24.10
apt update && apt install python3-venv git
python3 -m venv .venv
. .venv/bin/activate
pip install cmake ninja
git clone https://github.com/wrenfold/wrenfold --recurse-submodules
cd wrenfold/
pip wheel --verbose --verbose --verbose .
But it seems fine.
From my reading of the original post I think it's more on the conda side of ninja?
I tried conda, same thing, still no lock up:
apt update && apt install curl git python3-pip
curl -Ls https://micro.mamba.pm/api/micromamba/linux-64/latest | tar -xvj bin/micromamba
eval "$(./bin/micromamba shell hook -s posix)"
micromamba install cmake ninja
micromamba activate base
git clone https://github.com/wrenfold/wrenfold --recurse-submodules
cd wrenfold
pip wheel --verbose --verbose --verbose .
I took another shot at replicating this again, under both Ubuntu 22.04 and Ubuntu 24.04 (and python 3.10 and 3.12).
I cannot seem to replicate it again either, unfortunately. The only advice I can give to anybody who experiences the same issue is: uninstall ninja from the virtual env, and use the version installed by scikit-build-core.