DiffDock-PP icon indicating copy to clipboard operation
DiffDock-PP copied to clipboard

Difficulty installing torch-cluster

Open michaelhla opened this issue 1 year ago • 4 comments

Goiing through the readme for setup, failing on this command:

# install compatible pytorch geometric in this order WITH versions
pip install --no-cache-dir  torch-scatter==2.0.9 torch-sparse==0.6.15 torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cu116.html

Failing with this error:

      In file included from csrc/cuda/fps_cuda.cu:3:
      /root/miniconda3/envs/diffdock_pp/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory
         10 | #include <cusolverDn.h>
            |          ^~~~~~~~~~~~~~
      compilation terminated.
      error: command '/root/miniconda3/envs/diffdock_pp/bin/nvcc' failed with exit code 1
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for torch-cluster
  Running setup.py clean for torch-cluster
Failed to build torch-cluster
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (torch-cluster)

Tried different gpu's, installing cusolver directly, different pytorch versions, etc. Going to try installing torch-cluster separately with newer version

michaelhla avatar Sep 21 '24 21:09 michaelhla

tried using a set up container on runpod for pytorch 1.13.0 (avoiding the conda env altogether), still stalls. Also tried installing newer torch cluster but still stalling. May work better with another cloud provider/gpu. Does anyone have any recommendations?

michaelhla avatar Sep 21 '24 22:09 michaelhla

download the packages to your local computer and then use pip install xxxxx.whl to install them

onlyonewater avatar Sep 26 '24 05:09 onlyonewater

See this issue. Additionally I had issues with MKL. For me what worked was:

mamba create -n diffdock_pp python=3.10.8 'mkl<2024.1'
conda activate diffdock_pp
mamba install pytorch=1.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia
# add version==1.6.0 for torch-cluster and add numpy<2
pip install --no-cache-dir  torch-scatter==2.0.9 torch-sparse==0.6.15 torch-cluster==1.6.0 'numpy<2' torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
# remove numpy which was installed in the previous command
pip install dill tqdm pyyaml pandas biopandas scikit-learn biopython e3nn wandb tensorboard tensorboardX matplotlib

I have also tried to install the necessary tools for compilation of torch-cluster in the conda env, but without success. For reference, here's what I tried.

mamba install 'mkl<2024.1'
# ran into torch-cluster compilation error:
# RuntimeError: The current installed version of g++ (13.3.0) is greater than the maximum required version by CUDA 11.6 (11.5.0). Please make sure to use an adequate version of g++ (>=6.0.0, <=11.5.0).
mamba install 'gxx<=11.5'
# ran into torch-cluster compilation error:
# fatal error: cusolverDn.h: No such file or directory
mamba install libcusolver-dev -c nvidia
# still running into other CUDA libraries issues

simone-pignotti avatar May 22 '25 14:05 simone-pignotti

The solution from simone-pignotti worked for me (I also had MKL issue).

In addition to that, I also had to specify the use of older versions of torch-geometric and e3nn for inference to run. (I used torch-geometric==2.2.0 and e3nn==0.5.1)

dxu16 avatar Oct 28 '25 22:10 dxu16