thundergbm icon indicating copy to clipboard operation
thundergbm copied to clipboard

I just install form pip

Open renjiechao88 opened this issue 5 years ago • 11 comments

from thundergbm import TGBMClassifier Traceback (most recent call last): File "", line 1, in File "/home/dell/anaconda3/lib/python3.7/site-packages/thundergbm/init.py", line 10, in from .thundergbm import * File "/home/dell/anaconda3/lib/python3.7/site-packages/thundergbm/thundergbm.py", line 32, in thundergbm = CDLL(lib_path) File "/home/dell/anaconda3/lib/python3.7/ctypes/init.py", line 364, in init self._handle = _dlopen(self._name, mode) OSError: libcusparse.so.10.0: cannot open shared object file: No such file or directory

renjiechao88 avatar Dec 17 '20 07:12 renjiechao88

I am getting the same error. Here is the cuda version on the machine.

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Nov_30_19:08:53_PST_2020
Cuda compilation tools, release 11.2, V11.2.67
Build cuda_11.2.r11.2/compiler.29373293_0

rivershah avatar Dec 20 '20 17:12 rivershah

Hi, can you show me what commands you used to install the ThunderGBM? It should be noticed that you should use the .whl file here if you want to use ThunderGBM in a Window system.

Kurt-Liuhf avatar Dec 21 '20 03:12 Kurt-Liuhf

I was install on ubuntu16.04 and use pip install thundergbm,but when i use from thundergbm import * it was failed

renjiechao88 avatar Dec 21 '20 08:12 renjiechao88

Hi @renjiechao88, thanks for your feedback. I have tried the same command for ThunderGBM installation on CentOS but it passed the test. I have read your error report and I recommend you to do the following things: cd to /usr/local/cuda (or you self-defined CUDA installation path) find -name libcus* To see if you can find libcusparse.so.10.0 or another version of libcusparse.so.* that matches your CUDA version. If you can find it, make sure your configuration of environment variables is correct (by using the command echo $LD_LIBRARY_PATH). If you cannot find it, please try installing the corresponding CUDA Toolkit and configure the environment variables. Hope it helps.

Kurt-Liuhf avatar Dec 21 '20 08:12 Kurt-Liuhf

usefind -name libcus* command, my result is ./doc/man/man7/libcusolver.so.7 ./doc/man/man7/libcusparse.7 ./doc/man/man7/libcusparse.so.7 ./doc/man/man7/libcusolver.7 ./lib64/libcusparse.so.9.0 ./lib64/libcusolver.so.9.0.176 ./lib64/stubs/libcusparse.so ./lib64/stubs/libcusolver.so ./lib64/libcusolver.so.9.0 ./lib64/libcusparse_static.a ./lib64/libcusparse.so ./lib64/libcusolver_static.a ./lib64/libcusparse.so.9.0.176 ./lib64/libcusolver.so my CUDA version is also 9.0 and i can use thunderSVM normally,but when i use thunderGBM it has error say OSError: libcusparse.so.10.0: cannot open shared object file: No such file or directory

renjiechao88 avatar Dec 21 '20 09:12 renjiechao88

echo $LD_LIBRARY_PATH result is /usr/local/cuda-9.0/lib64

so thank you for your response @Kurt-Liuhf

renjiechao88 avatar Dec 21 '20 09:12 renjiechao88

Hi @renjiechao88, can you try installing ThunderGBM by using this wheel file? I have rebuilt it by using CUDA 9.0. Thank you.

Kurt-Liuhf avatar Dec 21 '20 09:12 Kurt-Liuhf

Here is the sequence of commands I used. Please let me know if further information needed. Is it a cuda 11 issue? Thanks. $ /apps/python/3.7.9/bin/python3.7 -m pip install --upgrade thundergbm echo $LD_LIBRARY_PATH /apps/python/3.7.9/lib/:/usr/local/cuda/lib64

/usr/local/cuda/lib64 $ ls
libaccinj64.so            libcufft_static_nocallback.a  libcusolver.so            libnppicc.so.11.2.1.68   libnppist.so.11.2.1.68  libnvjpeg_static.a
libaccinj64.so.11.2       libcufftw.so                  libcusolver.so.11         libnppicc_static.a       libnppist_static.a      libnvperf_host.so
libaccinj64.so.11.2.67    libcufftw.so.10               libcusolver.so.11.0.2.68  libnppidei.so            libnppisu.so            libnvperf_host_static.a
libcublasLt.so            libcufftw.so.10.4.0.72        libcusolver_static.a      libnppidei.so.11         libnppisu.so.11         libnvperf_target.so
libcublasLt.so.11         libcufftw_static.a            libcusparse.so            libnppidei.so.11.2.1.68  libnppisu.so.11.2.1.68  libnvptxcompiler_static.a
libcublasLt.so.11.3.1.68  libcuinj64.so                 libcusparse.so.11         libnppidei_static.a      libnppisu_static.a      libnvrtc-builtins.so
libcublasLt_static.a      libcuinj64.so.11.2            libcusparse.so.11.3.1.68  libnppif.so              libnppitc.so            libnvrtc-builtins.so.11.2
libcublas.so              libcuinj64.so.11.2.67         libcusparse_static.a      libnppif.so.11           libnppitc.so.11         libnvrtc-builtins.so.11.2.67
libcublas.so.11           libculibos.a                  liblapack_static.a        libnppif.so.11.2.1.68    libnppitc.so.11.2.1.68  libnvrtc.so
libcublas.so.11.3.1.68    libcupti.so                   libmetis_static.a         libnppif_static.a        libnppitc_static.a      libnvrtc.so.11.2
libcublas_static.a        libcupti.so.11.2              libnppc.so                libnppig.so              libnpps.so              libnvrtc.so.11.2.67
libcudadevrt.a            libcupti.so.2020.3.0          libnppc.so.11             libnppig.so.11           libnpps.so.11           libnvToolsExt.so
libcudart.so              libcupti_static.a             libnppc.so.11.2.1.68      libnppig.so.11.2.1.68    libnpps.so.11.2.1.68    libnvToolsExt.so.1
libcudart.so.11.0         libcurand.so                  libnppc_static.a          libnppig_static.a        libnpps_static.a        libnvToolsExt.so.1.0.0
libcudart.so.11.2.72      libcurand.so.10               libnppial.so              libnppim.so              libnvblas.so            libOpenCL.so
libcudart_static.a        libcurand.so.10.2.3.68        libnppial.so.11           libnppim.so.11           libnvblas.so.11         libOpenCL.so.1
libcufft.so               libcurand_static.a            libnppial.so.11.2.1.68    libnppim.so.11.2.1.68    libnvblas.so.11.3.1.68  libOpenCL.so.1.0
libcufft.so.10            libcusolverMg.so              libnppial_static.a        libnppim_static.a        libnvjpeg.so            libOpenCL.so.1.0.0
libcufft.so.10.4.0.72     libcusolverMg.so.11           libnppicc.so              libnppist.so             libnvjpeg.so.11         nvrtc-prev
libcufft_static.a         libcusolverMg.so.11.0.2.68    libnppicc.so.11           libnppist.so.11          libnvjpeg.so.11.3.1.68  stubs

Python 3.7.9 (default, Dec 15 2020, 09:47:30)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import thundergbm
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/.local/lib/python3.7/site-packages/thundergbm/__init__.py", line 10, in <module>
    from .thundergbm import *
  File "/home/xxx/.local/lib/python3.7/site-packages/thundergbm/thundergbm.py", line 32, in <module>
    thundergbm = CDLL(lib_path)
  File "/apps/python/3.7.9/lib/python3.7/ctypes/__init__.py", line 364, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: libcusparse.so.10.0: cannot open shared object file: No such file or directory

rivershah avatar Dec 22 '20 09:12 rivershah

Hi @RiverShah, thanks for your information. I think it is the cuda 11.0 (due to the lack of libcusparse.so.10.0)that causes the issue. You can try to build a suitable tgbm .whl for your machine from scratch. You should refer to the commands listed in How to build the Python wheel file for Linux. Then you can use pip to install the wheel file built by yourself. Enjoy~

Kurt-Liuhf avatar Dec 22 '20 09:12 Kurt-Liuhf

@Kurt-Liuhf Thanks for looking at this. Unfortunately I work in a fairly closed down cluster environment and it'll be difficult to get these dependencies onto each machine without a simple pip install process. Any chance that the pip install thundergbm command can be compatible with cuda 11 out of the box please?

rivershah avatar Dec 23 '20 10:12 rivershah

Got this error trying to rebuild:

$ mkdir build && cd build && cmake .. && make -j
....
CMakeFiles/Makefile2:126: recipe for target 'src/thundergbm/CMakeFiles/thundergbm.dir/all' failed
make[1]: *** [src/thundergbm/CMakeFiles/thundergbm.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

Couldn't trace all 2 errors back in the log but here is one:

~/thundergbm/src/thundergbm/sparse_columns.cu(52): error: identifier "cusparseScsr2csc" is undefined

1 error detected in the compilation of "~/thundergbm/src/thundergbm/sparse_columns.cu".
CMake Error at thundergbm_generated_sparse_columns.cu.o.cmake:266 (message):
  Error generating file
  ~/thundergbm/build/src/thundergbm/CMakeFiles/thundergbm.dir//./thundergbm_generated_sparse_columns.cu.o

src/thundergbm/CMakeFiles/thundergbm.dir/build.make:154: recipe for target 'src/thundergbm/CMakeFiles/thundergbm.dir/thundergbm_generated_sparse_columns.cu.o' failed

CUDA version:

$ whereis cuda
cuda: /usr/local/cuda

$ ls -l /usr/local/ | grep cuda
lrwxrwxrwx  1 root root   22 nov 11 16:38 cuda -> /etc/alternatives/cuda
lrwxrwxrwx  1 root root   25 nov 11 16:38 cuda-11 -> /etc/alternatives/cuda-11
drwxr-xr-x 16 root root 4096 nov 11 16:38 cuda-11.5

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Sep_13_19:13:29_PDT_2021
Cuda compilation tools, release 11.5, V11.5.50
Build cuda_11.5.r11.5/compiler.30411180_0

arilwan avatar Nov 12 '21 11:11 arilwan