bitsandbytes
bitsandbytes copied to clipboard
WSL: libcuda.so: cannot open shared object file
Hello, I'm on a completely fresh Windows Subsystem for Linux with CUDA support installation of Ubuntu. What I've done so far:
- Installed MiniForge/Conda
- Made a new env in a folder
- Ran
conda install cudatoolkit - Pip installed pytorch, transformers, accelerate, and bitsandbytes
- Attempted to run the HF Pipeline demo with 8 bit quant enabled.
When running a model, I get the following error:
CUDA SETUP: CUDA runtime path found: /home/zaptrem/bigmodels/env/lib/libcudart.so
Traceback (most recent call last):
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 57, in get_cuda_lib_handle
cuda = ctypes.CDLL("libcuda.so")
File "/home/zaptrem/bigmodels/env/lib/python3.8/ctypes/__init__.py", line 373, in __init__
self._handle = _dlopen(self._name, mode)
OSError: libcuda.so: cannot open shared object file: No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1030, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/home/zaptrem/bigmodels/env/lib/python3.8/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 843, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/models/bloom/modeling_bloom.py", line 34, in <module>
from ...modeling_utils import PreTrainedModel
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/modeling_utils.py", line 88, in <module>
from .utils.bitsandbytes import get_key_to_not_convert, replace_8bit_linear, set_module_8bit_tensor_to_device
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/bitsandbytes.py", line 10, in <module>
import bitsandbytes as bnb
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/__init__.py", line 6, in <module>
from .autograd._functions import (
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 4, in <module>
import bitsandbytes.functional as F
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/functional.py", line 14, in <module>
from .cextension import COMPILED_WITH_CUDA, lib
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 41, in <module>
lib = CUDALibrary_Singleton.get_instance().lib
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 37, in get_instance
cls._instance.initialize()
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cextension.py", line 15, in initialize
binary_name = evaluate_cuda_setup()
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 130, in evaluate_cuda_setup
cuda = get_cuda_lib_handle()
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py", line 60, in get_cuda_lib_handle
raise Exception('CUDA SETUP: ERROR! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!')
Exception: CUDA SETUP: ERROR! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "largemodels.py", line 25, in <module>
pipe = pipeline(model=name, model_kwargs= {"device_map": "auto", "load_in_8bit": True}, max_new_tokens=max_new_tokens)
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/pipelines/__init__.py", line 676, in pipeline
framework, model = infer_framework_load_model(
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/pipelines/base.py", line 229, in infer_framework_load_model
_class = getattr(transformers_module, architecture, None)
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1021, in __getattr__
value = getattr(module, name)
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1020, in __getattr__
module = self._get_module(self._class_to_module[name])
File "/home/zaptrem/bigmodels/env/lib/python3.8/site-packages/transformers/utils/import_utils.py", line 1032, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.bloom.modeling_bloom because of the following error (look up to see its traceback):
CUDA SETUP: ERROR! libcuda.so not found! Do you have a CUDA driver installed? If you are on a cluster, make sure you are on a CUDA machine!
Like I said before, I ensured this wasn't an environment issue by completely resetting the Linux environment. CUDA works fine otherwise.
Here is a workaround for the issue:
export LD_LIBRARY_PATH=/usr/lib/wsl/lib
Great !
Yes, you need to manually specify the LD_LIBRARY_PATH for some OS systems
Great ! Yes, you need to manually specify the
LD_LIBRARY_PATHfor some OS systems
I haven't found this to be necessary for PyTorch or any other libs that require CUDA. Do they use a superior lib detection mechanism bitsandbytes could adopt? Thanks!
I believe this issue is caused by a missing reference to the multi-arch path to the CUDA driver. You likely can solve the problem via calling sudo ldconfig. See here for more information.
I haven't found this to be necessary for PyTorch or any other libs that require CUDA. Do they use a superior lib detection mechanism bitsandbytes could adopt? Thanks!
Since we do an automatic detection of the right CUDA library at runtime, we check your CUDA compute capability during runtime via the CUDA driver. The automatic detection helps you to do a single install if you have multiple GPUs with different compute capabilities. But for this to work, the CUDA driver needs to be discoverable on a system which is not the case here. We are still figure out how to smooth these things out and errors like these will likely be fixed soon.
I believe this is fixed in the latest version. It prints instructions on how to debug the situation and alternatively prints out compilation instructions which should fix the issue.
I had no luck with exporting the variable, but I did manage to copy over the file from my tensorflow environment to my pytorch environment, which solved that problem, now.. onto other problems...
cp /home/me/miniconda3/envs/tf/lib/libcudart.so /home/me/miniconda3/envs/pytorch/lib/