cupy icon indicating copy to clipboard operation
cupy copied to clipboard

Conda install cupy CUDA Runtime Version (locally installed) failed to load cudart64_12.dll

Open desplenterkarel opened this issue 9 months ago • 6 comments

Description

When installing using conda cupy fails to load the cudart64_12.dll for the local installed runtime version.

Debugging my self i found that the DLL bin path Adding DLL search path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 misses the \bin part. This is normal as line 210 of file cupy/cupy/_environment.py states that using conda the cuda_path should be used. This path doesn't have the \bin part. [https://github.com/cupy/cupy/blob/66820586ee1c41013868a8de4977c84f29180bc8/cupy/_environment.py#L210]

The question is do i have something setup wrong with my paths? And how can i fix the install? My fix for the moment is to add os.add_dll_directory(r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin') to every cupy script.

To Reproduce

initial output

import cupy
cupy.show_config()

OS : Windows-10-10.0.19045-SP0 Python Version : 3.12.8 CuPy Version : 13.4.1 CuPy Platform : NVIDIA CUDA NumPy Version : 1.26.4 SciPy Version : 1.13.1 Cython Build Version : 3.0.12 Cython Runtime Version : None CUDA Root : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 nvcc PATH : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.EXE CUDA Build Version : 12080 CUDA Driver Version : 12020 CUDA Runtime Version : 12080 (linked to CuPy) / RuntimeError("CuPy failed to load cudart64_12.dll: FileNotFoundError: Could not find module 'cudart64_12.dll' (or one of its dependencies). Try using the full path with constructor syntax.") (locally installed) CUDA Extra Include Dirs : ['C:\Users\kedsplen\.conda\envs\Project_oorbel\Library\include'] cuBLAS Version : 120205 cuFFT Version : 11008 cuRAND Version : 10303 cuSOLVER Version : (11, 5, 2) cuSPARSE Version : 12102 NVRTC Version : (12, 2) Thrust Version : 200800 CUB Build Version : 200800 Jitify Build Version : cuDNN Build Version : None cuDNN Version : None NCCL Build Version : None NCCL Runtime Version : None cuTENSOR Version : None cuSPARSELt Build Version : None Device 0 Name : NVIDIA A40-48Q Device 0 Compute Capability : 86 Device 0 PCI Bus ID : 0000:06:10.0

fixed output

import cupy
import os
os.add_dll_directory(r'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin')
cupy.show_config()

OS : Windows-10-10.0.19045-SP0 Python Version : 3.12.8 CuPy Version : 13.4.1 CuPy Platform : NVIDIA CUDA NumPy Version : 1.26.4 SciPy Version : 1.13.1 Cython Build Version : 3.0.12 Cython Runtime Version : None CUDA Root : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2 nvcc PATH : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.EXE CUDA Build Version : 12080 CUDA Driver Version : 12020 CUDA Runtime Version : 12080 (linked to CuPy) / 12020 (locally installed) CUDA Extra Include Dirs : ['C:\Users\kedsplen\.conda\envs\Project_oorbel\Library\include'] cuBLAS Version : 120205 cuFFT Version : 11008 cuRAND Version : 10303 cuSOLVER Version : (11, 5, 2) cuSPARSE Version : 12102 NVRTC Version : (12, 2) Thrust Version : 200800 CUB Build Version : 200800 Jitify Build Version : cuDNN Build Version : None cuDNN Version : None NCCL Build Version : None NCCL Runtime Version : None cuTENSOR Version : None cuSPARSELt Build Version : None Device 0 Name : NVIDIA A40-48Q Device 0 Compute Capability : 86 Device 0 PCI Bus ID : 0000:06:10.0

Installation

Conda-Forge (conda install ...)

Environment

name: Project_oorbel
channels:
  - https://software.repos.intel.com/python/conda
  - ccpi
  - conda-forge
  - defaults
dependencies:
  - cuda-python == 12.2.1
  - cuda-version == 12.2
  - cupy
```


```
>>> import cupy
[CUPY_DEBUG_LIBRARY_LOAD] CUDA_PATH: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
[CUPY_DEBUG_LIBRARY_LOAD] Not wheel distribution (C:\Users\kedsplen\.conda\envs\Project_oorbel\Lib\site-packages\cupy\.data\lib not found)
[CUPY_DEBUG_LIBRARY_LOAD] Adding DLL search path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
[CUPY_DEBUG_LIBRARY_LOAD] Preloading triggered for library: cutensor
[CUPY_DEBUG_LIBRARY_LOAD] Not preloading cutensor as this is not a pip wheel installation
>>> cupy.show_config(_full=True)
[CUPY_DEBUG_LIBRARY_LOAD] Library "cusparse64_12.dll" loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_createDescr): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_destroyDescr): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_bufferSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_analysis): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpSM_solve): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSpMatSetAttribute): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseCreateCsc): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSparseToDense_bufferSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseSparseToDense): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseDenseToSparse_bufferSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseDenseToSparse_analysis): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] cusparse64_12.dll (cusparseDenseToSparse_convert): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] Library "nvrtc64_120_0.dll" loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetErrorString): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcVersion): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcCreateProgram): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcDestroyProgram): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcCompileProgram): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetPTXSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetPTX): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetCUBINSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetCUBIN): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetProgramLogSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetProgramLog): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcAddNameExpression): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetLoweredName): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetNumSupportedArchs): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetSupportedArchs): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetNVVMSize): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] nvrtc64_120_0.dll (nvrtcGetNVVM): function loaded
[CUPY_DEBUG_LIBRARY_LOAD] Not preloading cudnn as this is not a pip wheel installation
[CUPY_DEBUG_LIBRARY_LOAD] Not preloading nccl as this is not a pip wheel installation
OS                           : Windows-10-10.0.19045-SP0
Python Version               : 3.12.8
CuPy Version                 : 13.4.1
CuPy Platform                : NVIDIA CUDA
NumPy Version                : 1.26.4
SciPy Version                : 1.13.1
Cython Build Version         : 3.0.12
Cython Runtime Version       : None
CUDA Root                    : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2
nvcc PATH                    : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin\nvcc.EXE
CUDA Build Version           : 12080
CUDA Driver Version          : 12020
CUDA Runtime Version         : 12080 (linked to CuPy) / RuntimeError("CuPy failed to load cudart64_12.dll: FileNotFoundError: Could not find module 'cudart64_12.dll' (or one of its dependencies). Try using the full path with constructor syntax.") (locally installed)
CUDA Extra Include Dirs      : ['C:\\Users\\kedsplen\\.conda\\envs\\Project_oorbel\\Library\\include']
cuBLAS Version               : 120205
cuFFT Version                : 11008
cuRAND Version               : 10303
cuSOLVER Version             : (11, 5, 2)
cuSPARSE Version             : 12102
NVRTC Version                : (12, 2)
Thrust Version               : 200800
CUB Build Version            : 200800
Jitify Build Version         : <unknown>
cuDNN Build Version          : None
cuDNN Version                : None
NCCL Build Version           : None
NCCL Runtime Version         : None
cuTENSOR Version             : None
cuSPARSELt Build Version     : None
Device 0 Name                : NVIDIA A40-48Q
Device 0 Compute Capability  : 86
Device 0 PCI Bus ID          : 0000:06:10.0
```


### Additional Information

_No response_

desplenterkarel avatar May 09 '25 13:05 desplenterkarel

Thanks for reporting, @desplenterkarel. Reproduced on my end as well. Now that CUDA_PATH is not overwritten by activation scripts in CuPy conda-forge packages (https://github.com/conda-forge/cupy-feedstock/pull/252), I think the logic should be updated. cc/ @jakirkham @leofang

kmaehashi avatar May 09 '25 15:05 kmaehashi

Thanks for the ping, @kmaehashi san. Yes we should fix this.

btw we should also document CUPY_DEBUG_LIBRARY_LOAD -- seems handy!

cc @rwgk for vis (Ralf is leading the Path Finder project on the CUDA Python side that can help with such load issues as well)

leofang avatar May 13 '25 02:05 leofang

Yeah, I have the exactly same issue. Cannot compile app with nuitka with Cupy Cuda acceleration. Installed cupy via conda forge. I have both cuda 12.8 and 12.9 versions on my system just for case, and of course both added to path variables. Nuitka support have no idea how to fix this. Could you please offer solution? python Python 3.12.11 | packaged by conda-forge | (main, Jun 4 2025, 14:29:09) [MSC v.1943 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information.

import cupy cupy.show_config() OS : Windows-11-10.0.26100-SP0 Python Version : 3.12.11 CuPy Version : 13.4.1 CuPy Platform : NVIDIA CUDA NumPy Version : 2.2.6 SciPy Version : 1.15.3 Cython Build Version : 3.0.12 Cython Runtime Version : None CUDA Root : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8 nvcc PATH : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.8\bin\nvcc.EXE CUDA Build Version : 12080 CUDA Driver Version : 12090 CUDA Runtime Version : 12080 (linked to CuPy) / RuntimeError("CuPy failed to load cudart64_12.dll: FileNotFoundError: Could not find module 'cudart64_12.dll' (or one of its dependencies). Try using the full path with constructor syntax.") (locally installed) CUDA Extra Include Dirs : ['C:\Users\Miki\anaconda3\envs\aplikacija\Library\include'] cuBLAS Version : (available) cuFFT Version : 11401 cuRAND Version : 10310 cuSOLVER Version : (11, 7, 5) cuSPARSE Version : (available) NVRTC Version : (12, 9) Thrust Version : 200800 CUB Build Version : 200800 Jitify Build Version : cuDNN Build Version : None cuDNN Version : None NCCL Build Version : None NCCL Runtime Version : None cuTENSOR Version : None cuSPARSELt Build Version : None Device 0 Name : NVIDIA GeForce RTX 3090 Device 0 Compute Capability : 86 Device 0 PCI Bus ID : 0000:01:00.0

zelenooki87 avatar Jun 11 '25 06:06 zelenooki87

Now that Cuda-python 13 has the pathfinder Module added this could be fixed? https://github.com/NVIDIA/cuda-python/tree/main/cuda_pathfinder/cuda/pathfinder

desplenterkarel avatar Aug 18 '25 09:08 desplenterkarel