DiffDock-PP
DiffDock-PP copied to clipboard
Difficulty installing torch-cluster
Goiing through the readme for setup, failing on this command:
# install compatible pytorch geometric in this order WITH versions
pip install --no-cache-dir torch-scatter==2.0.9 torch-sparse==0.6.15 torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
Failing with this error:
In file included from csrc/cuda/fps_cuda.cu:3:
/root/miniconda3/envs/diffdock_pp/lib/python3.10/site-packages/torch/include/ATen/cuda/CUDAContext.h:10:10: fatal error: cusolverDn.h: No such file or directory
10 | #include <cusolverDn.h>
| ^~~~~~~~~~~~~~
compilation terminated.
error: command '/root/miniconda3/envs/diffdock_pp/bin/nvcc' failed with exit code 1
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for torch-cluster
Running setup.py clean for torch-cluster
Failed to build torch-cluster
ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (torch-cluster)
Tried different gpu's, installing cusolver directly, different pytorch versions, etc. Going to try installing torch-cluster separately with newer version
tried using a set up container on runpod for pytorch 1.13.0 (avoiding the conda env altogether), still stalls. Also tried installing newer torch cluster but still stalling. May work better with another cloud provider/gpu. Does anyone have any recommendations?
download the packages to your local computer and then use pip install xxxxx.whl to install them
See this issue. Additionally I had issues with MKL. For me what worked was:
mamba create -n diffdock_pp python=3.10.8 'mkl<2024.1'
conda activate diffdock_pp
mamba install pytorch=1.13.0 pytorch-cuda=11.6 -c pytorch -c nvidia
# add version==1.6.0 for torch-cluster and add numpy<2
pip install --no-cache-dir torch-scatter==2.0.9 torch-sparse==0.6.15 torch-cluster==1.6.0 'numpy<2' torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.13.0+cu116.html
# remove numpy which was installed in the previous command
pip install dill tqdm pyyaml pandas biopandas scikit-learn biopython e3nn wandb tensorboard tensorboardX matplotlib
I have also tried to install the necessary tools for compilation of torch-cluster in the conda env, but without success. For reference, here's what I tried.
mamba install 'mkl<2024.1'
# ran into torch-cluster compilation error:
# RuntimeError: The current installed version of g++ (13.3.0) is greater than the maximum required version by CUDA 11.6 (11.5.0). Please make sure to use an adequate version of g++ (>=6.0.0, <=11.5.0).
mamba install 'gxx<=11.5'
# ran into torch-cluster compilation error:
# fatal error: cusolverDn.h: No such file or directory
mamba install libcusolver-dev -c nvidia
# still running into other CUDA libraries issues
The solution from simone-pignotti worked for me (I also had MKL issue).
In addition to that, I also had to specify the use of older versions of torch-geometric and e3nn for inference to run. (I used torch-geometric==2.2.0 and e3nn==0.5.1)