bitsandbytes WSL: libcuda.so: cannot open shared object file

Hello, I'm on a completely fresh Windows Subsystem for Linux with CUDA support installation of Ubuntu. What I've done so far:

Installed MiniForge/Conda
Made a new env in a folder
Ran conda install cudatoolkit
Pip installed pytorch, transformers, accelerate, and bitsandbytes
Attempted to run the HF Pipeline demo with 8 bit quant enabled.

When running a model, I get the following error:

CUDA SETUP: CUDA runtime path found: /home/zaptrem/bigmodels/env/lib/libcudart.so
Traceback (most recent call last):
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 57, in get_cuda_lib_handle
    cuda = ctypes.CDLL("libcuda.so")
  File "/home/zaptrem/bigmodels/env/lib/python3.8/ctypes/__init__.py", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcuda.so: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1030, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "/home/zaptrem/bigmodels/env/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 34, in <module>
    from ...modeling_utils import PreTrainedModel
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/modeling_utils.py", line 88, in <module>
    from .utils.bitsandbytes import get_key_to_not_convert, replace_8bit_linear, set_module_8bit_tensor_to_device
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/bitsandbytes.py", line 10, in <module>
    import bitsandbytes as bnb
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 6, in <module>
    from .autograd._functions import (
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 4, in <module>
    import bitsandbytes.functional as F
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/functional.py", line 14, in <module>
    from .cextension import COMPILED_WITH_CUDA, lib
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 41, in <module>
    lib = CUDALibrary_Singleton.get_instance().lib
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 37, in get_instance
    cls._instance.initialize()
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 15, in initialize
    binary_name = evaluate_cuda_setup()
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 130, in evaluate_cuda_setup
    cuda = get_cuda_lib_handle()
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 60, in get_cuda_lib_handle
    raise Exception('CUDA SETUP: ERROR! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!')
Exception: CUDA SETUP: ERROR! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "largemodels.py", line 25, in <module>
    pipe = pipeline(model=name, model_kwargs= {"device_map": "auto", "load_in_8bit": True}, max_new_tokens=max_new_tokens)
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 676, in pipeline
    framework, model = infer_framework_load_model(
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/pipelines/base.py", line 229, in infer_framework_load_model
    _class = getattr(transformers_module, architecture, None)
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1021, in __getattr__
    value = getattr(module, name)
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1020, in __getattr__
    module = self._get_module(self._class_to_module[name])
  File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1032, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.bloom.modeling_bloom because of the following error (look up to see its traceback):
CUDA SETUP: ERROR! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!

Like I said before, I ensured this wasn't an environment issue by completely resetting the Linux environment. CUDA works fine otherwise.

Aug 19 '22 20:08 zaptrem

Here is a workaround for the issue:

export LD_LIBRARY_PATH=/usr/lib/wsl/lib

Aug 20 '22 06:08 zaptrem

Great ! Yes, you need to manually specify the LD_LIBRARY_PATH for some OS systems

Aug 20 '22 08:08 younesbelkada

Great ! Yes, you need to manually specify the LD_LIBRARY_PATH for some OS systems

I haven't found this to be necessary for PyTorch or any other libs that require CUDA. Do they use a superior lib detection mechanism bitsandbytes could adopt? Thanks!

Aug 22 '22 01:08 zaptrem

I believe this issue is caused by a missing reference to the multi-arch path to the CUDA driver. You likely can solve the problem via calling sudo ldconfig. See here for more information.

I haven't found this to be necessary for PyTorch or any other libs that require CUDA. Do they use a superior lib detection mechanism bitsandbytes could adopt? Thanks!

Since we do an automatic detection of the right CUDA library at runtime, we check your CUDA compute capability during runtime via the CUDA driver. The automatic detection helps you to do a single install if you have multiple GPUs with different compute capabilities. But for this to work, the CUDA driver needs to be discoverable on a system which is not the case here. We are still figure out how to smooth these things out and errors like these will likely be fixed soon.

Sep 05 '22 22:09 TimDettmers

I believe this is fixed in the latest version. It prints instructions on how to debug the situation and alternatively prints out compilation instructions which should fix the issue.

Oct 27 '22 14:10 TimDettmers

I had no luck with exporting the variable, but I did manage to copy over the file from my tensorflow environment to my pytorch environment, which solved that problem, now.. onto other problems...

cp /home/me/miniconda3/envs/tf/lib/libcudart.so /home/me/miniconda3/envs/pytorch/lib/

Sep 27 '23 20:09 texasdave2

bitsandbytes bitsandbytes copied to clipboard

WSL: libcuda.so: cannot open shared object file

bitsandbytes
bitsandbytes copied to clipboard