CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

libcudnn_cnn problem

Open 781574155 opened this issue 1 year ago • 3 comments

INFO:faster_whisper:Processing audio with duration 03:52.176
Unable to load any of {libcudnn_cnn.so.9.1.0, libcudnn_cnn.so.9.1, libcudnn_cnn.so.9, libcudnn_cnn.so}
Invalid handle. Cannot load symbol cudnnCreateConvolutionDescriptor

ctranslate2==4.5.0 torch==2.5.1 faster-whisper==1.1.0

image

781574155 avatar Dec 01 '24 04:12 781574155

after setup LD_LIBRARY_PATH, program run correctly.

export LD_LIBRARY_PATH=/usr/local/lib/python3.11/site-packages/nvidia/cudnn/lib/:$LD_LIBRARY_PATH

It's should be correctly setup automatically!

781574155 avatar Dec 01 '24 05:12 781574155

Get the same problem.

On my machine, I have cuda 12.7, torch==2.5.1. If I use ctranslate2==4.5.0 with torch, I got

Unable to load any of {libcudnn_cnn.so.9.1.0, libcudnn_cnn.so.9.1, libcudnn_cnn.so.9, libcudnn_cnn.so}

If I use ctranslate2==4.4.0 with torch, I got

Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory

Using cuda with ctranslate2 or torch individually is fine. But you cannot invoke torch.cuda before ctranslate2.

The following files exist on my environment:

.venv/lib/python3.12/site-packages/nvidia/cudnn/lib/libcudnn_cnn.so.9
/usr/lib/libcudnn_cnn.so
/usr/lib/libcudnn_cnn.so.9
/usr/lib/libcudnn_cnn.so.9.5.1

Setting LD_LIBRARY_PATH="/path/to/.venv/lib/python3.12/site-packages/nvidia/cudnn/lib" works around the problem.

The problem does not exist when torch is not loaded. However, if I remove the system libcudnn under /usr/lib/, then ctranslate2 will fail even WITHOUT torch:

Unable to load any of {libcudnn_cnn.so.9.1.0, libcudnn_cnn.so.9.1, libcudnn_cnn.so.9, libcudnn_cnn.so}

Weird problem. My hypothesis is that torch always loads /path/to/.venv/lib/python3.12/site-packages/nvidia/cudnn/lib but ld used by ctranslate2 tries to load /usr/lib/libcudnn_cnn.so. When used individually, it is fine. However, the later will fail if torch already loads the its cudnn in the same application.

zhou13 avatar Dec 02 '24 07:12 zhou13

def load_cudnn():
    import torch

    if not torch.cuda.is_available():
        print("[INFO] CUDA is not available, skipping cuDNN setup.")
        return

    if sys.platform == "win32":
        torch_lib_dir = Path(torch.__file__).parent / "lib"
        if torch_lib_dir.exists():
            os.add_dll_directory(str(torch_lib_dir))
            print(f"[INFO] Added DLL directory: {torch_lib_dir}")
        else:
            print(f"[WARNING] Torch lib directory not found: {torch_lib_dir}")

    elif sys.platform == "linux":
        site_packages = Path(torch.__file__).resolve().parents[1]
        cudnn_dir = site_packages / "nvidia" / "cudnn" / "lib"

        if not cudnn_dir.exists():
            print(f"[ERROR] cudnn dir not found: {cudnn_dir}")
            return

        pattern = str(cudnn_dir / "libcudnn_cnn*.so*")
        matching_files = sorted(glob.glob(pattern))
        if not matching_files:
            print(f"[ERROR] No libcudnn_cnn*.so* found in {cudnn_dir}")
            return

        for so_path in matching_files:
            try:
                ctypes.CDLL(so_path, mode=ctypes.RTLD_GLOBAL)
                print(f"[INFO] Loaded: {so_path}")
            except OSError as e:
                print(f"[WARNING] Failed to load {so_path}: {e}")

This may work for you, if you don't want to export LD_LIBRARY_PATH

XXXXRT666 avatar May 09 '25 09:05 XXXXRT666