Gpufit icon indicating copy to clipboard operation
Gpufit copied to clipboard

Linker problem with cuBLAS build in Linux

Open delorytheape opened this issue 6 years ago • 4 comments

I ran into problems building on a fresh install of Linux (Ubuntu 18.04) when trying to build with the cuBLAS library. Details are OS: Ubuntu 18.04 GCC: 7.4.0 cmake: 3.14.3 Boost: 1.65.1 CUDA: 10.1.105-1 Nvidia Driver: 418.39-1 GPU: GTX1080ti Gpufit: 12496a... Apr 12 16:11:54 2019

I used the following commands to build;

mkdir Gpufit-build
cd Gpufit-build
cmake -DCMAKE_BUILD_TYPE=RELEASE -DUSE_CUBLAS=TRUE ../Gpufit
make 

The build failed with the following statements...

../libGpufit.so: undefined reference to 'init_gemm_select'
../libGpufit.so: undefined reference to 'free_gemm_select'
../libGpufit.so: undefined reference to 'cublasLtGetVersion'
../libGpufit.so: undefined reference to 'cublasLtGetProperty'
../libGpufit.so: undefined reference to 'cublasLtCtxInit'
../libGpufit.so: undefined reference to 'cublasLtGetCudartVersion'
../libGpufit.so: undefined reference to 'cublasLtShutdownCtx'

I was able to eventually get around this (and subsequent) problem as follows; (Note; I do not claim that this is the best way to solve the problem. It is merely what worked for me, and may be of assistance in addressing the actual problem).

  1. After installing cuda toolkit (from .deb package) I had to create symbolic links from the /usr/local/cuda directory to the cuBLAS libraries (for some odd reason with CUDA 10.1, Nvida have put these in a different location /usr/lib/x86_64-linux-gnu). The commands I used were
sudo ln -s /usr/local/cuda/libcublasLt.so /usr/lib/x86_64-linux-gnu/libcublasLt.so
sudo ln -s /usr/local/cuda/libcublasLt.so.10 /usr/lib/x86_64-linux-gnu/libcublasLt.so.10
sudo ln -s /usr/local/cuda/libcublasLt.so10.1.0.105 /usr/lib/x86_64-linux-gnu/libcublasLt.so.10.1.0.105
sudo ln -s /usr/local/cuda/libcublasLt_static.a /usr/lib/x86_64-linux-gnu/libcublasLt_static.a
sudo ln -s /usr/local/cuda/libcublas.so /usr/lib/x86_64-linux-gnu/libcublas.so
sudo ln -s /usr/local/cuda/libcublas.so.10 /usr/lib/x86_64-linux-gnu/libcublas.so.10
sudo ln -s /usr/local/cuda/libcublas.so10.1.0.105 /usr/lib/x86_64-linux-gnu/libcublas.so.10.1.0.105
sudo ln -s /usr/local/cuda/libcublas_static.a /usr/lib/x86_64-linux-gnu/libcublas_static.a
  1. To fix the build errors, it was necessary to tell the linker to include the static library libcublasLt_static.a This was achieved by modifying the files Gpufit repository path/Gpufit/CMakeLists.txt Gpufit repository path/Gpufit/examples/CMakeLists.txt Gpufit repository path/Gpufit/Gpufit/examples/CMakeLists.txt In each file, every instance of the line
target_link_libraries(${target} ${modules})

was replaced with the following line:

target_link_libraries(${target} ${modules} /usr/local/cuda/lib64/libcublasLt_static.a)

The build directory was then erased

cd <Gpufit repository path>/Gpufit-build
rm -rf *

and the cmake command rerun

cmake -DCMAKE_BUILD_TYPE=RELEASE -DUSE_CUBLAS=TRUE ../Gpufit
make

The build then succeeded. As previously mentioned, I am not very familiar with the cmake machinery, so there may be a much better way to fix this problem than this workaround. Hope it helps anyway.

delorytheape avatar May 15 '19 06:05 delorytheape

I had the same problem, I solved it with the following changes to a single CMake file. The repo path/Gpufit/MakeLists.txt files was edited as follows:

   else()
        set( CUDA_CUBLAS_LIBRARIES 
            /usr/lib/x86_64-linux-gnu/libcublas_static.a 
            /usr/lib/x86_64-linux-gnu/libcublasLt_static.a
            ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart_static.a
            dl
            pthread
            rt
            #${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcublas_static.a
            ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libculibos.a )
    endif()

I had two problems, the cublas libraries weren't in the CUDA_TOOLKIT_ROOT_DIR, and additional libraries were required. The above edit isn't the most robust, but it worked for my platform. Looking how to make this more robust I found the FIND_CUDA module has been deprecated. So someone should really rewrite the repo path/Gpufit/MakeLists.txt file to make use of the native CUDA support in CMake >3.10 now.

ironictoo avatar Jul 09 '19 19:07 ironictoo

After more research the native CMake CUDA support doesn't really have as many features as the deprecated FIND_CUDA, which is probably why it isn't being used. I updated the above to the slightly more robust:

            find_cuda_helper_libs(cublas_static)
            find_cuda_helper_libs(cublasLt_static)
            find_cuda_helper_libs(culibos)

            set( CUDA_CUBLAS_LIBRARIES 
                ${CUDA_cublas_static_LIBRARY}
                ${CUDA_cublasLt_static_LIBRARY}
                ${CUDA_cudart_static_LIBRARY}
                ${CUDA_culibos_LIBRARY}
                dl
                pthread
                rt )

ironictoo avatar Jul 11 '19 19:07 ironictoo

I checked it and it may depend on the CUDA version. It may require some additional CMake code and testing. I'll come back to it later.

jkfindeisen avatar Jun 09 '21 08:06 jkfindeisen

In ed273cac245d0cfb64efe5f475942790adcda969 I improved the situation a bit, but I leave this open here until the PR #94 is decided.

jkfindeisen avatar Aug 31 '21 14:08 jkfindeisen