pytorch_sparse
pytorch_sparse copied to clipboard
pip repeatedly selecting wrong CUDA version during install
Hi!
I am trying to get an installation running on an HPC cluster with somewhat older dependencies. I have found torch 1.8.0 with CUDA 11.1 to work, so now I want to install the matching torch_sparse version:
$pip install torch-sparse -f https://data.pyg.org/whl/torch-1.8.0+cu111.html --no-cache-dir
Looking in links: https://data.pyg.org/whl/torch-1.8.0+cu111.html
Collecting torch-sparse
Downloading torch_sparse-0.6.16.tar.gz (208 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 208.2/208.2 kB 11.8 MB/s eta 0:00:00
Preparing metadata (setup.py) ... done
Requirement already satisfied: scipy in /home/icb/florin.ratajczak/anaconda3/envs/speostest/lib/python3.7/site-packages (from torch-sparse) (1.7.3)
Requirement already satisfied: numpy<1.23.0,>=1.16.5 in /home/icb/florin.ratajczak/anaconda3/envs/speostest/lib/python3.7/site-packages (from scipy->torch-sparse) (1.21.5)
Building wheels for collected packages: torch-sparse
Building wheel for torch-sparse (setup.py) ... done
Created wheel for torch-sparse: filename=torch_sparse-0.6.16-cp37-cp37m-linux_x86_64.whl size=1716194 sha256=46187b7f1bf7bae117f0e216f72bd6cf67ae95c2f9b6734d5369aa47099de042
Stored in directory: /tmp/pip-ephem-wheel-cache-23v87jns/wheels/ff/5c/28/5d12cf8ac7bb8bc3de9dda8fa446cb4aeb9fffe19ef1028538
Successfully built torch-sparse
Installing collected packages: torch-sparse
Successfully installed torch-sparse-0.6.16
I just added the --no-cache-dir flag to make sure it doesnt load some other version from cache.
However, when I now try to import it, it says its compiled for CUDA 12.0:
(speostest) [florin.ratajczak@hpc-submit03gui speos]$ python
Python 3.7.16 (default, Jan 17 2023, 22:20:44)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.__version__
'1.8.0'
>>> torch.version.cuda
'11.1'
>>> import torch_sparse
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/icb/florin.ratajczak/anaconda3/envs/speostest/lib/python3.7/site-packages/torch_sparse/__init__.py", line 33, in <module>
f'Detected that PyTorch and torch_sparse were compiled with '
RuntimeError: Detected that PyTorch and torch_sparse were compiled with different CUDA versions. PyTorch has CUDA version 11.1 and torch_sparse has CUDA version 12.0. Please reinstall the torch_sparse that matches your PyTorch install.
Any idea why it says it was compiled for CUDA 12.0? I might have to host a course on that cluster in a week or so, so any help is much appreciated!
Hi, I have a similar problem.
I installed torch-sparse like this:
pip install torch-scatter torch-sparse==0.6.12 torch-cluster torch-spline-conv torch-geometric==2.0.4 -f https://data.pyg.org/whl/torch-1.8.2+cpu.html --no-cache-dir
But while running I get the following error:
Traceback (most recent call last):
File "/dss/dsshome1/lxc03/gobi005/BlockEQTL/speos/speos/scripts/explanation_scripts/explanation_one_model.py", line 2, in <module>
from speos.models import ModelBootstrapper
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/speos/models.py", line 1, in <module>
from speos.architectures import GeneNetwork, RelationalGeneNetwork, FCNN, LINKX, SimpleGCN
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/speos/architectures.py", line 4, in <module>
import torch_geometric.nn as pyg_nn
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_geometric/__init__.py", line 4, in <module>
import torch_geometric.data
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_geometric/data/__init__.py", line 1, in <module>
from .data import Data
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_geometric/data/data.py", line 9, in <module>
from torch_sparse import SparseTensor
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_sparse/__init__.py", line 16, in <module>
f'{library}_{suffix}', [osp.dirname(__file__)]).origin)
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch/_ops.py", line 104, in load_library
ctypes.CDLL(path)
File "/dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /dss/dsshome1/lxc03/gobi005/miniconda3/envs/speos/lib/python3.7/site-packages/torch_sparse/_convert_cpu.so: undefined symbol: __kmpc_fork_call
I think that the reason for this might also be an error produced by different cuda versions of pytorch and torch-sparse.
Do you have a PyTorch CUDA version installed? Otherwise, I don't think this is necessarily related to CUDA but more due to a mismatch in PyTorch versions.
Thanks for the response!
I have tested several cuda and cpu versions and the only one that worked was If I install torch for cuda 11.6 (which is CUDA 12.0 compatible) and then later install torch-sparse for 11.6 as well. All other cuda/cpu versions lead to the problem that torch-sparse is incompatible with torch, even if both have been installed with the right instructions. Also, from my output above you can see that the error message says that torch is compiled for CUDA 11.1 and torch-sparse for 12.0, even though i explicitely requested the cu111 version. Perhaps some paths got scrambled up and torch-sparse is installed for CUDA 12.0, no matter which -f argument is specified?
This issue had no activity for 6 months. It will be closed in 2 weeks unless there is some new activity. Is this issue already resolved?
Have you tried with pip install --no-build-isolation ?
If not, it will install torch to build the package and install it afterward.
I had that issue with ROCm when it would ignore the rocm torch version and install the default torch + cuda version.