Pre-built binaries CUDA extention unavailable
Originally posted by @LouisJalouzot in https://github.com/teddykoker/torchsort/issues/90#issuecomment-2878093698
I have been trying around some binaries but unfortunately even when matching exactly the versions of Python, PyTorch and CUDA, I get the following error:
ImportError: You are trying to use the torchsort CUDA extension, but it looks like it is not available. Make sure you have the CUDA toolchain installed, and reinstall torchsort with `pip install --force-reinstall --no-cache-dir torchsort` to rebuild the extension.
I tried for instance with a machine running on Rocky Linux release 9.5 (Blue Onyx) with CUDA 12.4 with the following environment:
Using Python 3.12.10 environment at: .test
Package Version
------------------------ ---------------
filelock 3.18.0
fsspec 2025.3.2
jinja2 3.1.6
markupsafe 3.0.2
mpmath 1.3.0
networkx 3.4.2
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
setuptools 80.4.0
sympy 1.13.1
torch 2.6.0+cu124
torchsort 0.1.9+pt26cu124
triton 3.2.0
typing-extensions 4.13.2
Reproduced this locally with Ubuntu 22.04, CUDA 12.4, Python 3.10, PyTorch 2.6.0+cu124. Not sure why this is the case as all of the packages seemed to build without errors. Will investigate further, but might not have the bandwidth for a little while @LouisJalouzot.
I had to debug this a bit myself, and it seems like the pre-built packages lack the compiled cuda module. Re-building myself ("python setup.py install") creates a corresponding "isotonic_cuda.cpython-312-x86_64-linux-gnu.so", whereas the prebuilt ones only have the cpu binary:
$ unzip torchsort-0.1.9+pt26cu124-cp312-cp312-linux_x86_64.whl
Archive: torchsort-0.1.9+pt26cu124-cp312-cp312-linux_x86_64.whl
inflating: torchsort/__init__.py
inflating: torchsort/isotonic_cpu.cpp
inflating: torchsort/isotonic_cpu.cpython-312-x86_64-linux-gnu.so
inflating: torchsort/isotonic_cuda.cu
inflating: torchsort/ops.py
inflating: torchsort-0.1.9+pt26cu124.dist-info/licenses/LICENSE
inflating: torchsort-0.1.9+pt26cu124.dist-info/METADATA
inflating: torchsort-0.1.9+pt26cu124.dist-info/WHEEL
inflating: torchsort-0.1.9+pt26cu124.dist-info/top_level.txt
inflating: torchsort-0.1.9+pt26cu124.dist-info/RECORD
Random thought, but setup.py only compiles the cuda binaries if nvcc is on the path, and by default cuda installations don't add it to the path...
@LouisJalouzot apologies for the delay! The pre-built cuda binaries should be working now for the latest release v0.1.10. Thanks @argusdusty, nvcc had to be added to the path before build in order to trigger the cuda compilation.
Wonderful, thanks a lot @teddykoker! (torchsort-0.1.10+pt26cu126-cp312-cp312-linux_x86_64.whl works on my side)