MegBA icon indicating copy to clipboard operation
MegBA copied to clipboard

Generating problem

Open Airplane5 opened this issue 2 years ago • 3 comments

While compiling MegBA. Some problem occured.

:~/MegBA/build$ make
[  7%] Built target cuManager
[  9%] Building NVCC (Device) object src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o
CMake Error at math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o.cmake:219 (message):
  Error generating
  /home/wlh/MegBA/build/src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir//./math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o


src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/build.make:598: recipe for target 'src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o' failed
make[2]: *** [src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/math_function_Jet_Vector_CUDA_generated_math_function_Jet_Vector_CUDA.cu.o] Error 1
CMakeFiles/Makefile2:183: recipe for target 'src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/all' failed
make[1]: *** [src/operator/CMakeFiles/math_function_Jet_Vector_CUDA.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

I am using cuda 11.6 with nvidia-driver-510.

Airplane5 avatar Jun 06 '22 08:06 Airplane5

BTW, I have installed libnccl2=2.12.10-1+cuda11.6 libnccl-dev=2.12.10-1+cuda11.6

Airplane5 avatar Jun 06 '22 09:06 Airplane5

After un- and re-installation cuda and nccl, generating error above disappeared, but another error apeeared with repeated warnings. There are mainly two warnings:

MegBA/thirdparty/Eigen/src/Core/util/DisableStupidWarnings.h(82): warning #20236-D: pragma "diag_suppress" is deprecated, use "nv_diag_suppress" instead

and

MegBA/thirdparty/Eigen/src/Core/util/Memory.h(291): warning #20014-D: calling a __host__ function from a __host__ __device__ function is not allowed

New error was:

/usr/bin/ld: BAL_Float: hidden symbol `__cudaUnregisterFatBinary' in /usr/local/cuda-11.6/lib64/libcudart_static.a(cudart_static.o) is referenced by DSO
/usr/bin/ld: 最后的链结失败: 错误的值
collect2: error: ld returned 1 exit status
examples/CMakeFiles/BAL_Float.dir/build.make:138: recipe for target 'examples/BAL_Float' failed
make[2]: *** [examples/BAL_Float] Error 1
CMakeFiles/Makefile2:720: recipe for target 'examples/CMakeFiles/BAL_Float.dir/all' failed
make[1]: *** [examples/CMakeFiles/BAL_Float.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

I have tried add "set(CMAKE_CXX_FLAGS_RELEASE "{CMAKE_CXXFLAGS_RELEASE} -fPIC")" into CMakelists.txt as some blogs said, but nothing changed.

Airplane5 avatar Jun 06 '22 10:06 Airplane5

Hi,

I guess this is because we only generate device code for SM60 (a.k.a Pascal arch). I have fixed this in https://github.com/MegviiRobot/MegBA/pull/27, but I am not sure this will work for you. Try the latest version. If you have any problems, feel free to discuss them.

Best regards, Jie

JieRen98 avatar Jun 09 '22 06:06 JieRen98