cuml icon indicating copy to clipboard operation
cuml copied to clipboard

[BUG] libcudart.so: cannot open shared object file: No such file or directory

Open zhimin-z opened this issue 1 year ago • 20 comments

Describe the bug I installed cuml and found it throws error in running:

Steps/Code to reproduce bug

import os
import pandas as pd

path_dataset = 'Dataset'
df_all = pd.read_json(os.path.join(path_dataset, 'filtered.json'))

from sentence_transformers import SentenceTransformer
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

docs = df_all['Challenge_summary'].tolist()
embeddings = embedding_model.encode(docs)

import matplotlib.pyplot as plt
import cuml
model = cuml.TSNE(n_neighbors=32)
embed2D = model.fit_transform(embeddings)
train['x'] = embed2D[:,0]
train['y'] = embed2D[:,1]
fig = plt.figure(figsize=(1000,1000))
plt.scatter(train.x,train.y,color='blue',s=10,label='Clusters')
fig.savefig('test.png')

Expected behavior It runs successfully.

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Linux Distro/Architecture: Linux docjk-gpu-01 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • GPU Model/Driver: A100 and 525.85.12
  • CUDA: 12.0
  • Method of cuDF & cuML install:
pip install cudf-cu11 dask-cudf-cu11 --extra-index-url=https://pypi.nvidia.com
pip install cuml-cu11 --extra-index-url=https://pypi.nvidia.com
pip install cugraph-cu11 --extra-index-url=https://pypi.nvidia.com

according to https://docs.rapids.ai/install#pip

Additional context Error trace:

(.venv) 21zz42@docjk-gpu-01:~/Asset-Management-Topic-Modeling$ python "Code/best_challenge copy.py"
Traceback (most recent call last):
  File "/home/21zz42/Asset-Management-Topic-Modeling/Code/best_challenge copy.py", line 52, in <module>
    import cuml
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/__init__.py", line 17, in <module>
    from cuml.internals.base import Base, UniversalBase
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/__init__.py", line 17, in <module>
    from cuml.internals.base_helpers import (
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/base_helpers.py", line 20, in <module>
    from cuml.internals.api_decorators import (
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/api_decorators.py", line 24, in <module>
    from cuml.internals import input_utils as iu
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/input_utils.py", line 19, in <module>
    from cuml.internals.array import CumlArray
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/array.py", line 22, in <module>
    from cuml.internals.global_settings import GlobalSettings
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/global_settings.py", line 20, in <module>
    from cuml.internals.device_type import DeviceType
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/device_type.py", line 19, in <module>
    from cuml.internals.mem_type import MemoryType
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/mem_type.py", line 25, in <module>
    cudf = gpu_only_import('cudf')
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cuml/internals/safe_imports.py", line 366, in gpu_only_import
    return importlib.import_module(module)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cudf/__init__.py", line 5, in <module>
    validate_setup()
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/cudf/utils/gpu_utils.py", line 95, in validate_setup
    cuda_runtime_version = runtimeGetVersion()
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/rmm/_cuda/gpu.py", line 87, in runtimeGetVersion
    major, minor = numba.cuda.runtime.get_version()
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 111, in get_version
    self.cudaRuntimeGetVersion(ctypes.byref(rtver))
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 65, in __getattr__
    self._initialize()
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/numba/cuda/cudadrv/runtime.py", line 51, in _initialize
    self.lib = open_cudalib('cudart')
  File "/home/21zz42/Asset-Management-Topic-Modeling/.venv/lib/python3.10/site-packages/numba/cuda/cudadrv/libs.py", line 60, in open_cudalib
    return ctypes.CDLL(path)
  File "/usr/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcudart.so: cannot open shared object file: No such file or directory

https://stackoverflow.com/questions/69934320/oserror-libcudart-so-10-2-cannot-open-shared-object-file-no-such-file-or-dire does not work for me since I could run Pytorch successfully.

zhimin-z avatar Mar 25 '23 22:03 zhimin-z

Hi @zhimin-z thanks for the issue! Currently the RAPIDS pip packages only support CUDA 11.x, so that is very likely the issue you're facing.

dantegd avatar Mar 28 '23 20:03 dantegd

Hi @zhimin-z thanks for the issue! Currently the RAPIDS pip packages only support CUDA 11.x, so that is very likely the issue you're facing.

What can I do now?I found I do not have permission to downgrade the CUDA driver since I was not the owner of the server.

zhimin-z avatar Mar 29 '23 00:03 zhimin-z

I also have a similar issue but running nvidia-smi shows my Environment has Cuda 11.7. image

Issue is, after installing:

!pip install cugraph-cu11 cudf-cu11 cuml-cu11 --extra-index-url=https://pypi.nvidia.com
!pip uninstall cupy-cuda115 -y
!pip uninstall cupy-cuda11x -y
!pip install cupy-cuda11x -f https://pip.cupy.dev/aarch64

I try to import: from cuml.cluster import HDBSCAN

But get: OSError: libcudart.so: cannot open shared object file: No such file or directory

noahberhe avatar May 19 '23 10:05 noahberhe

Just adding another data point, and posting a thanks to developers for their work on this. Currently, the installation guide (https://docs.rapids.ai/install#pip) claims support for CUDA 12 with pip. I am running CUDA 12.0. My cuml installation was successful with pip (pip install cudf-cu12 cuml-cu12 --extra-index-url=https://pypi.nvidia.com). But I get the same libcudart.so error when I try to train a model.

mike@henry:~$ nvidia-smi
Thu Jul 27 17:42:47 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A300...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   56C    P8    17W / 115W |    865MiB /  6144MiB |     25%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2044      G   /usr/lib/xorg/Xorg                362MiB |
|    0   N/A  N/A      2527      G   /usr/bin/gnome-shell              142MiB |
|    0   N/A  N/A      3481      G   ...veSuggestionsOnlyOnDemand       82MiB |
|    0   N/A  N/A      8067      G   ...8/usr/lib/firefox/firefox      183MiB |
|    0   N/A  N/A     37940      G   ...RendererForSitePerProcess       35MiB |
+-----------------------------------------------------------------------------+

mfschmidt avatar Jul 27 '23 22:07 mfschmidt

Can confirm. Pip installation is successful with CUDA Version: 12.0 , but when running import cudf I get the following error as well.

OSError: libcudart.so: cannot open shared object file: No such file or directory

brendanartley avatar Jul 28 '23 15:07 brendanartley

@mfschmidt @brendanartley Can you share more about your OS and version (e.g. Ubuntu 20.04, whether you're using containers or WSL), how you installed the CUDA Toolkit, and the outputs of ls -al /usr/local/cuda*?

bdice avatar Jul 28 '23 20:07 bdice

@bdice Thanks for your response and interest; sorry I'm slow getting back to this. I'm running Ubuntu 22.04.3 on a Dell Precision Workstation with an nVidia RTX A3000 GPU and nVidia drivers version 525.125.06. I'm using a python virtual environment, but no docker or WSL.

I had no /usr/local/cuda* paths and I had not installed CUDA Toolkit. After installing the CUDA Toolkit this morning, I imported cuml from within python and the error does not occur.

I think it may have been unclear to me (rapidly and mindlessly copy/pasting commands rather than actually reading instructions) that the CUDA Toolkit was required in addition to nvidia drivers. I assumed the nvidia drivers were sufficient.

Thank you for your help!! I believe my issue is now resolved by installing CUDA Toolkit, and I'll post back to this thread if I discover additional related problems.

mfschmidt avatar Aug 06 '23 16:08 mfschmidt

If possible, it would be ideal if the pip installer could install CUDA Toolkit as a dependency. If that's not possible, an informative warning or error that it's missing and must be installed separately would be very useful.

Thank you again for your help, and for making the world better with open source software!! :)

mfschmidt avatar Aug 06 '23 16:08 mfschmidt

Hi @mfschmidt

If possible, it would be ideal if the pip installer could install CUDA Toolkit as a dependency. If that's not possible, an informative warning or error that it's missing and must be installed separately would be very useful.

We do statically link libcudart in RAPIDS wheels, however some dependencies like numba/cupy link to libcudart dynamically, and the error stack trace shows that they are the ones unable to find libcudart. We'll need to consider whether we should add this as a warning or our upstream libraries should - thanks for your suggestion.

divyegala avatar Aug 08 '23 23:08 divyegala

I also face the same error with CUDA 11.4 (RTX 3090)

I try to import: from cuml.manifold import UMAP

And get this error: OSError: libcudart.so: cannot open shared object file: No such file or directory

[Edited] Solved this issue by installing via conda conda create -n rapids -c rapidsai -c conda-forge -c nvidia \ rapids=23.08 python=3.9 cuda-version=11.8

mdsatria avatar Aug 31 '23 12:08 mdsatria

@mdsatria it looks to me like you don't have CUDA toolkit installed on your system, which is a requirement for cuML wheels

divyegala avatar Aug 31 '23 16:08 divyegala

I had a very similar issue, where the problem was unmatching versions of CUDA and CUDA toolkit.

You can check your version of CUDA with: nvidia-smi

You can check your version of CUDA toolkit with: nvcc --version

If you don't have CUDA toolkit installed, I find that the easiest way to install it is with Anaconda: conda install -c nvidia cuda-nvcc

I hope it helps! :)

MariyaSha avatar Oct 31 '23 19:10 MariyaSha

The same is happening also on Google Colab with V100

Wed Nov  8 07:01:04 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   34C    P0    24W / 300W |      2MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

installed as suggested in the docs

pip install \
    --extra-index-url=https://pypi.nvidia.com \
    cudf-cu12 dask-cudf-cu12 cuml-cu12 cugraph-cu12 cuspatial-cu12 cuproj-cu12 cuxfilter-cu12 cucim

failing with:

/content# python
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cudf
/usr/local/lib/python3.10/dist-packages/cupy/_environment.py:447: UserWarning: 
--------------------------------------------------------------------------------

  CuPy may not function correctly because multiple CuPy packages are installed
  in your environment:

    cupy-cuda11x, cupy-cuda12x

  Follow these steps to resolve this issue:

    1. For all packages listed above, run the following command to remove all
       existing CuPy installations:

         $ pip uninstall <package_name>

      If you previously installed CuPy via conda, also run the following:

         $ conda uninstall cupy

    2. Install the appropriate CuPy package.
       Refer to the Installation Guide for detailed instructions.

         https://docs.cupy.dev/en/stable/install.html

--------------------------------------------------------------------------------

  warnings.warn(f'''
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 17, in <module>
    from cupy import _core  # NOQA
  File "/usr/local/lib/python3.10/dist-packages/cupy/_core/__init__.py", line 3, in <module>
    from cupy._core import core  # NOQA
ImportError: libcudart.so.12: cannot open shared object file: No such file or directory

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/cudf/__init__.py", line 12, in <module>
    import cupy
  File "/usr/local/lib/python3.10/dist-packages/cupy/__init__.py", line 19, in <module>
    raise ImportError(f'''
ImportError: 
================================================================
Failed to import CuPy.

If you installed CuPy via wheels (cupy-cudaXXX or cupy-rocm-X-X), make sure that the package matches with the version of CUDA or ROCm installed.

On Linux, you may need to set LD_LIBRARY_PATH environment variable depending on how you installed CUDA/ROCm.
On Windows, try setting CUDA_PATH environment variable.

Check the Installation Guide for details:
  https://docs.cupy.dev/en/latest/install.html

Original error:
  ImportError: libcudart.so.12: cannot open shared object file: No such file or directory
================================================================

Borda avatar Nov 08 '23 07:11 Borda

Is this a tracked issue? @dantegd

Borda avatar Nov 08 '23 07:11 Borda

@Borda for installations via the pip package manager, you need cudatoolkit installed at the system level. This is because pip managed cupy dynamically links to system level libcudart.

Also, it seems like your environment has multiple cupy installations.

divyegala avatar Nov 08 '23 16:11 divyegala

for installations via the pip package manager, you need cudatoolkit installed at the system level. This is because pip managed cupy dynamically links to system level libcudart.

interesting so you say I need to install: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#upgrading-from-cudatoolkit-package

Also, it seems like your environment has multiple cupy installations.

yes but it came with your installation cmd, it was not there before

Borda avatar Nov 08 '23 17:11 Borda

@Borda could you share the output of !nvcc --version?

The nvidia-smi output indicates that your CUDA Driver version supports CUDA 12.0, but your CUDA runtime may be 11.x. At least some of Colab's GPU runtimes are using CUDA Toolkit 11.8, in which case when you start from a fresh runtime you should install the cu11 packages.

The rapids.ai quick start has a Colab launcher that includes script that should hopefully get you up and running!

beckernick avatar Nov 10 '23 00:11 beckernick

could you share the output of !nvcc --version?

/content# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0

Borda avatar Nov 10 '23 01:11 Borda

@Borda Google Colab uses CUDA 11, but your installation command above uses CUDA 12. That is what is causing the failure to find the linked libcudart.so. If using pip packages, you must match the CUDA major versions by replacing cu12 with cu11 in the package names like this:

pip install \
    --extra-index-url=https://pypi.nvidia.com/ \
    cudf-cu11 dask-cudf-cu11 cuml-cu11 cugraph-cu11 cuspatial-cu11 cuproj-cu11 cuxfilter-cu11 cucim

edit: Sorry, I scrolled too fast and missed that @beckernick already gave this answer above. Apologies for the noise.

bdice avatar Nov 11 '23 01:11 bdice

I had a similar problem on Ubuntu, but it had to do with the naming of the .so file. I just make a copy of the .so and changed its name to match that of which the library is looking for and voila!, everything works.

  1. find the .so file's location
find / -name libcudart.so.12
  1. cd into the folder containing the libcudart.so.12 file and make a copy, leaving out the .12.
cd .../anaconda3/envs/envname/lib/python3.11/site-packages/nvidia/cuda_runtime/lib/
cp libcudart.so.12 libcudart.so
  1. you might have to add the folders to the path too. I had to do it for every single library :face_with_spiral_eyes:
export PATH=.../anaconda3/envs/envname/lib/python3.11/site-packages/nvidia/cublas/lib/${PATH:+:${PATH}}
export LD_LIBRARY_PATH=.../anaconda3/envs/envnam/lib/python3.11/site-packages/nvidia/cublas/lib/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

...

jwnz avatar Apr 30 '24 08:04 jwnz