NNPOps
NNPOps copied to clipboard
Compilation
It is required to have to build nnpops with version 11.* of cudatoolkit and a 10.3 gxx compile. These are going to be outdated. When I try to compile with own gnu gcc/g++ version 11.4 and Cuda 12.3 I get this error:
CMake Error at /home/coyote/miniconda3/envs/nnpops/share/cmake/Caffe2/Caffe2Targets.cmake:144 (message):
The imported target "c10_cuda" references the file
"/home/coyote/miniconda3/envs/nnpops/lib/libc10_cuda.so"
but this file does not exist. Possible reasons include:
* The file was deleted, renamed, or moved to another location.
* An install or uninstall procedure did not complete successfully.
* The installation package was faulty and contained
"/home/coyote/miniconda3/envs/nnpops/share/cmake/Caffe2/Caffe2Targets.cmake"
but not all the files it references.
Call Stack (most recent call first):
/home/coyote/miniconda3/envs/nnpops/share/cmake/Caffe2/Caffe2Config.cmake:113 (include)
/home/coyote/miniconda3/envs/nnpops/lib/python3.11/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:68 (find_package)
CMakeLists.txt:13 (find_package)
You may have some sort of conflict in your environment. Possibly you have an incompatible version of PyTorch installed? There are conda packages for CUDA 12, so it definitely can compile.
We should update the environment.yml file in this repository. Replace cudatoolkit
with cuda-version
, and probably specify a newer PyTorch.
THanks,
I got it to install with gxx_linux-64 11.3.0
, pytorch-gpu
from channel pytorch
and general cudatoolkit
. There are some issues (gaps) I am seeing now between @jharrymoore/openmmtools (MACE) and openmm Simulations which require a Platform with platformProperties that I will eventually submit a PR for.
I'll try later with cuda12, however I was hoping to just use my base CUDA since I am installing on PCs.
The cudatoolkit package does not include nvcc. The conda-forge "nvcc" is just a meta package that links to your system nvcc. This can easily get out of sync. I agree we should update the env file with the new conda-forge CUDA packages. This would make it only for CUDA>=12, but I think that is ok. The nvidia channel can be used for previous versions if need be.
I am having an issue with the compilation with CUDA 12.4.
The error I get:
/python3.10/site-packages/torch/share/cmake/Caffe2/public/cuda.cmake:70 (message): Failed to find nvToolsExt Call Stack (most recent call first):
This looks related to https://discuss.pytorch.org/t/failed-to-find-nvtoolsext/179635
When I install it using conda, it somehow tried to installing the CPU version.
This is the pytorch version I have:
pytorch 2.4.1 py3.10_cuda12.4_cudnn9.1.0_0
pytorch-cuda 12.4 hc786d27_6
When I enforce the version I want, this is the error I get:
conda install -c conda-forge nnpops=0.6=cuda120py310h3ec4162_11
Channels:
- conda-forge
- defaults
- nvidia
- pytorch
Platform: linux-64
Collecting package metadata (repodata.json): done
Solving environment: failed
LibMambaUnsatisfiableError: Encountered problems while solving:
- nothing provides __cuda needed by pytorch-2.4.0-cuda118_py310h954aa82_300
Could not solve for environment specs
The following package could not be installed
└─ nnpops ==0.6 cuda120py310h3ec4162_11 is installable and it requires
└─ pytorch [* cuda*|>=2.4.0,<2.5.0a0 ] with the potential options
├─ pytorch [2.4.0|2.4.1], which can be installed;
├─ pytorch [2.4.0|2.4.1] would require
│ └─ __cuda, which is missing on the system;
└─ pytorch * conflicts with any installable versions previously reported.
I am looking for ways to get this to working with what I have. Is there a way to achieve that or do I need to downgrade?