FasterTransformer
FasterTransformer copied to clipboard
Undefined reference to `MPI::Comm::Comm()'
Description
Tesla K80. Cuda 11.3. CudNN 8.2.
[ 87%] Built target vit_int8_example
[ 87%] Built target GptJ
[ 88%] Built target BertINT8
[ 89%] Built target swin_example
[ 90%] Built target decoding_example
[ 91%] Built target T5Decoding
[ 92%] Built target SwinINT8
[ 93%] Built target GptJTritonBackend
[ 93%] Building CXX object examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o
[ 94%] Built target bert_int8_example
[ 94%] Building CXX object examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o
[ 94%] Linking CXX executable ../../../bin/gptj_example
[ 95%] Built target gpt_example
[ 96%] Built target ParallelGptTritonBackend
[ 96%] Building CXX object examples/cpp/gptj/CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o
[ 96%] Built target T5TritonBackend
[ 97%] Built target swin_int8_example
[ 97%] Building CXX object examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o
[ 97%] Built target transformer-shared
[ 98%] Built target transformer-static
CMakeFiles/gptj_example.dir/gptj_example.cc.o: In function `MPI::Op::Init(void (*)(void const*, void*, int, MPI::Datatype const&), bool)':
gptj_example.cc:(.text._ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb[_ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb]+0x15): undefined reference to `ompi_mpi_cxx_op_intercept'
CMakeFiles/gptj_example.dir/gptj_example.cc.o: In function `MPI::Intracomm::Clone() const':
gptj_example.cc:(.text._ZNK3MPI9Intracomm5CloneEv[_ZNK3MPI9Intracomm5CloneEv]+0x3c): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_example.dir/gptj_example.cc.o: In function `MPI::Graphcomm::Clone() const':
gptj_example.cc:(.text._ZNK3MPI9Graphcomm5CloneEv[_ZNK3MPI9Graphcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_example.dir/gptj_example.cc.o: In function `MPI::Cartcomm::Sub(bool const*) const':
gptj_example.cc:(.text._ZNK3MPI8Cartcomm3SubEPKb[_ZNK3MPI8Cartcomm3SubEPKb]+0x9e): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_example.dir/gptj_example.cc.o: In function `MPI::Intracomm::Create_graph(int, int const*, int const*, bool) const':
gptj_example.cc:(.text._ZNK3MPI9Intracomm12Create_graphEiPKiS2_b[_ZNK3MPI9Intracomm12Create_graphEiPKiS2_b]+0x39): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_example.dir/gptj_example.cc.o: In function `MPI::Cartcomm::Clone() const':
gptj_example.cc:(.text._ZNK3MPI8Cartcomm5CloneEv[_ZNK3MPI8Cartcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_example.dir/gptj_example.cc.o:gptj_example.cc:(.text._ZNK3MPI9Intracomm11Create_cartEiPKiPKbb[_ZNK3MPI9Intracomm11Create_cartEiPKiPKbb]+0xa8): more undefined references to `MPI::Comm::Comm()' follow
CMakeFiles/gptj_example.dir/gptj_example.cc.o:(.data.rel.ro._ZTVN3MPI8DatatypeE[_ZTVN3MPI8DatatypeE]+0x78): undefined reference to `MPI::Datatype::Free()'
CMakeFiles/gptj_example.dir/gptj_example.cc.o:(.data.rel.ro._ZTVN3MPI3WinE[_ZTVN3MPI3WinE]+0x48): undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
examples/cpp/gptj/CMakeFiles/gptj_example.dir/build.make:140: recipe for target 'bin/gptj_example' failed
make[2]: *** [bin/gptj_example] Error 1
CMakeFiles/Makefile2:5678: recipe for target 'examples/cpp/gptj/CMakeFiles/gptj_example.dir/all' failed
make[1]: *** [examples/cpp/gptj/CMakeFiles/gptj_example.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
[ 99%] Linking CXX executable ../../../bin/gptj_triton_example
[ 99%] Linking CXX executable ../../../bin/multi_gpu_gpt_triton_example
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o: In function `MPI::Op::Init(void (*)(void const*, void*, int, MPI::Datatype const&), bool)':
gptj_triton_example.cc:(.text._ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb[_ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb]+0x15): undefined reference to `ompi_mpi_cxx_op_intercept'
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o: In function `MPI::Intracomm::Clone() const':
gptj_triton_example.cc:(.text._ZNK3MPI9Intracomm5CloneEv[_ZNK3MPI9Intracomm5CloneEv]+0x3c): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o: In function `MPI::Graphcomm::Clone() const':
gptj_triton_example.cc:(.text._ZNK3MPI9Graphcomm5CloneEv[_ZNK3MPI9Graphcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o: In function `MPI::Cartcomm::Sub(bool const*) const':
gptj_triton_example.cc:(.text._ZNK3MPI8Cartcomm3SubEPKb[_ZNK3MPI8Cartcomm3SubEPKb]+0x9e): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o: In function `MPI::Intracomm::Create_graph(int, int const*, int const*, bool) const':
gptj_triton_example.cc:(.text._ZNK3MPI9Intracomm12Create_graphEiPKiS2_b[_ZNK3MPI9Intracomm12Create_graphEiPKiS2_b]+0x39): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o: In function `MPI::Cartcomm::Clone() const':
gptj_triton_example.cc:(.text._ZNK3MPI8Cartcomm5CloneEv[_ZNK3MPI8Cartcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o:gptj_triton_example.cc:(.text._ZNK3MPI9Intracomm11Create_cartEiPKiPKbb[_ZNK3MPI9Intracomm11Create_cartEiPKiPKbb]+0xa8): more undefined references to `MPI::Comm::Comm()' follow
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o:(.data.rel.ro._ZTVN3MPI8DatatypeE[_ZTVN3MPI8DatatypeE]+0x78): undefined reference to `MPI::Datatype::Free()'
CMakeFiles/gptj_triton_example.dir/gptj_triton_example.cc.o:(.data.rel.ro._ZTVN3MPI3WinE[_ZTVN3MPI3WinE]+0x48): undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
examples/cpp/gptj/CMakeFiles/gptj_triton_example.dir/build.make:102: recipe for target 'bin/gptj_triton_example' failed
make[2]: *** [bin/gptj_triton_example] Error 1
CMakeFiles/Makefile2:5709: recipe for target 'examples/cpp/gptj/CMakeFiles/gptj_triton_example.dir/all' failed
make[1]: *** [examples/cpp/gptj/CMakeFiles/gptj_triton_example.dir/all] Error 2
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o: In function `MPI::Op::Init(void (*)(void const*, void*, int, MPI::Datatype const&), bool)':
multi_gpu_gpt_triton_example.cc:(.text._ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb[_ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb]+0x15): undefined reference to `ompi_mpi_cxx_op_intercept'
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o: In function `MPI::Intracomm::Clone() const':
multi_gpu_gpt_triton_example.cc:(.text._ZNK3MPI9Intracomm5CloneEv[_ZNK3MPI9Intracomm5CloneEv]+0x3c): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o: In function `MPI::Graphcomm::Clone() const':
multi_gpu_gpt_triton_example.cc:(.text._ZNK3MPI9Graphcomm5CloneEv[_ZNK3MPI9Graphcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o: In function `MPI::Cartcomm::Sub(bool const*) const':
multi_gpu_gpt_triton_example.cc:(.text._ZNK3MPI8Cartcomm3SubEPKb[_ZNK3MPI8Cartcomm3SubEPKb]+0x9e): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o: In function `MPI::Intracomm::Create_graph(int, int const*, int const*, bool) const':
multi_gpu_gpt_triton_example.cc:(.text._ZNK3MPI9Intracomm12Create_graphEiPKiS2_b[_ZNK3MPI9Intracomm12Create_graphEiPKiS2_b]+0x39): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o: In function `MPI::Cartcomm::Clone() const':
multi_gpu_gpt_triton_example.cc:(.text._ZNK3MPI8Cartcomm5CloneEv[_ZNK3MPI8Cartcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o:multi_gpu_gpt_triton_example.cc:(.text._ZNK3MPI9Intracomm11Create_cartEiPKiPKbb[_ZNK3MPI9Intracomm11Create_cartEiPKiPKbb]+0xa8): more undefined references to `MPI::Comm::Comm()' follow
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o:(.data.rel.ro._ZTVN3MPI8DatatypeE[_ZTVN3MPI8DatatypeE]+0x78): undefined reference to `MPI::Datatype::Free()'
CMakeFiles/multi_gpu_gpt_triton_example.dir/multi_gpu_gpt_triton_example.cc.o:(.data.rel.ro._ZTVN3MPI3WinE[_ZTVN3MPI3WinE]+0x48): undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_triton_example.dir/build.make:102: recipe for target 'bin/multi_gpu_gpt_triton_example' failed
make[2]: *** [bin/multi_gpu_gpt_triton_example] Error 1
CMakeFiles/Makefile2:5907: recipe for target 'examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_triton_example.dir/all' failed
make[1]: *** [examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_triton_example.dir/all] Error 2
[ 99%] Linking CXX executable ../../../bin/multi_gpu_gpt_example
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o: In function `MPI::Op::Init(void (*)(void const*, void*, int, MPI::Datatype const&), bool)':
multi_gpu_gpt_example.cc:(.text._ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb[_ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb]+0x15): undefined reference to `ompi_mpi_cxx_op_intercept'
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o: In function `MPI::Intracomm::Clone() const':
multi_gpu_gpt_example.cc:(.text._ZNK3MPI9Intracomm5CloneEv[_ZNK3MPI9Intracomm5CloneEv]+0x3c): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o: In function `MPI::Graphcomm::Clone() const':
multi_gpu_gpt_example.cc:(.text._ZNK3MPI9Graphcomm5CloneEv[_ZNK3MPI9Graphcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o: In function `MPI::Cartcomm::Sub(bool const*) const':
multi_gpu_gpt_example.cc:(.text._ZNK3MPI8Cartcomm3SubEPKb[_ZNK3MPI8Cartcomm3SubEPKb]+0x9e): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o: In function `MPI::Intracomm::Create_graph(int, int const*, int const*, bool) const':
multi_gpu_gpt_example.cc:(.text._ZNK3MPI9Intracomm12Create_graphEiPKiS2_b[_ZNK3MPI9Intracomm12Create_graphEiPKiS2_b]+0x39): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o: In function `MPI::Cartcomm::Clone() const':
multi_gpu_gpt_example.cc:(.text._ZNK3MPI8Cartcomm5CloneEv[_ZNK3MPI8Cartcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o:multi_gpu_gpt_example.cc:(.text._ZNK3MPI9Intracomm11Create_cartEiPKiPKbb[_ZNK3MPI9Intracomm11Create_cartEiPKiPKbb]+0xa8): more undefined references to `MPI::Comm::Comm()' follow
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o:(.data.rel.ro._ZTVN3MPI8DatatypeE[_ZTVN3MPI8DatatypeE]+0x78): undefined reference to `MPI::Datatype::Free()'
CMakeFiles/multi_gpu_gpt_example.dir/multi_gpu_gpt_example.cc.o:(.data.rel.ro._ZTVN3MPI3WinE[_ZTVN3MPI3WinE]+0x48): undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_example.dir/build.make:142: recipe for target 'bin/multi_gpu_gpt_example' failed
make[2]: *** [bin/multi_gpu_gpt_example] Error 1
CMakeFiles/Makefile2:5806: recipe for target 'examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_example.dir/all' failed
make[1]: *** [examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_example.dir/all] Error 2
[100%] Linking CXX executable ../../../bin/multi_gpu_gpt_async_example
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o: In function `MPI::Op::Init(void (*)(void const*, void*, int, MPI::Datatype const&), bool)':
multi_gpu_gpt_async_example.cc:(.text._ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb[_ZN3MPI2Op4InitEPFvPKvPviRKNS_8DatatypeEEb]+0x15): undefined reference to `ompi_mpi_cxx_op_intercept'
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o: In function `MPI::Intracomm::Clone() const':
multi_gpu_gpt_async_example.cc:(.text._ZNK3MPI9Intracomm5CloneEv[_ZNK3MPI9Intracomm5CloneEv]+0x3c): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o: In function `MPI::Graphcomm::Clone() const':
multi_gpu_gpt_async_example.cc:(.text._ZNK3MPI9Graphcomm5CloneEv[_ZNK3MPI9Graphcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o: In function `MPI::Cartcomm::Sub(bool const*) const':
multi_gpu_gpt_async_example.cc:(.text._ZNK3MPI8Cartcomm3SubEPKb[_ZNK3MPI8Cartcomm3SubEPKb]+0x9e): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o: In function `MPI::Intracomm::Create_graph(int, int const*, int const*, bool) const':
multi_gpu_gpt_async_example.cc:(.text._ZNK3MPI9Intracomm12Create_graphEiPKiS2_b[_ZNK3MPI9Intracomm12Create_graphEiPKiS2_b]+0x39): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o: In function `MPI::Cartcomm::Clone() const':
multi_gpu_gpt_async_example.cc:(.text._ZNK3MPI8Cartcomm5CloneEv[_ZNK3MPI8Cartcomm5CloneEv]+0x35): undefined reference to `MPI::Comm::Comm()'
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o:multi_gpu_gpt_async_example.cc:(.text._ZNK3MPI9Intracomm11Create_cartEiPKiPKbb[_ZNK3MPI9Intracomm11Create_cartEiPKiPKbb]+0xa8): more undefined references to `MPI::Comm::Comm()' follow
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o:(.data.rel.ro._ZTVN3MPI8DatatypeE[_ZTVN3MPI8DatatypeE]+0x78): undefined reference to `MPI::Datatype::Free()'
CMakeFiles/multi_gpu_gpt_async_example.dir/multi_gpu_gpt_async_example.cc.o:(.data.rel.ro._ZTVN3MPI3WinE[_ZTVN3MPI3WinE]+0x48): undefined reference to `MPI::Win::Free()'
collect2: error: ld returned 1 exit status
examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_async_example.dir/build.make:142: recipe for target 'bin/multi_gpu_gpt_async_example' failed
make[2]: *** [bin/multi_gpu_gpt_async_example] Error 1
CMakeFiles/Makefile2:5877: recipe for target 'examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_async_example.dir/all' failed
make[1]: *** [examples/cpp/multi_gpu_gpt/CMakeFiles/multi_gpu_gpt_async_example.dir/all] Error 2
Makefile:135: recipe for target 'all' failed
make: *** [all] Error 2
Reproduced Steps
1. git clone https://github.com/NVIDIA/FasterTransformer.git
2. mkdir -p FasterTransformer/build
3. cd FasterTransformer/build
4. git submodule init && git submodule update
5. pip3 install fire jax jaxlib
6. cmake -DSM=37 -DCMAKE_BUILD_TYPE=Release -DBUILD_MULTI_GPU=ON ..
7. make -j
What docker image do you use?
What docker image do you use?
I don't use Docker. If it is so mandatory, recommend me an image :)
Docker is not necessary. But it is helpful to setup the environment. I guess the problem you encounter is you don't install the MPI, or the make file cannot find the MPI successfully.
We have recommend some docker images in the guides of different models. You can choose one by your requirement.
Docker is not necessary. But it is helpful to setup the environment. I guess the problem you encounter is you don't install the MPI, or the make file cannot find the MPI successfully.
We have recommend some docker images in the guides of different models. You can choose one by your requirement.
Thank you. I'll try to run it in the docker.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
ERROR: Detected NVIDIA NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: Detected NVIDIA NVIDIA Tesla K80 GPU, which is not supported by this container
ERROR: No supported GPU(s) detected to run this container
ERROR: This container was built for NVIDIA Driver Release 470.42 or later, but
version 465.19.01 was detected and compatibility mode is UNAVAILABLE.
I got this when I installed the docker from the tutorial https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gptj_guide.md
ok, your driver and gpu are both too old to use the docker. Try to build the repo in your previous environment, but add -DBUILD_MULTI_GPU=OFF
.
@byshiue I need a gpt-j inference with two tesla k80s. Is this even possible? I have tried so many things. Nowhere works because the Tesla K80 is too old.
You can try the suggestion in this issue https://github.com/NVIDIA/FasterTransformer/issues/69.
Besides, you can try to add -lmpi
into CMakeLists.txt of gptj_example
.
Code compiled with the docker. I will try running gpt
Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.
I had the same problem in a non-docker environment too. Adding mpi_cxx
to the link dependency of mpi_utils
solved it.
Got the same problem, I installed openmpi==4.0.2 from anaconda
Adding
mpi_cxx
to the link dependency ofmpi_utils
solved it.
Could you pleace tell me how to add the mpi_cxx link dependency ? Thanks. @lzhangzz
--------------------- edited---------- ohhh,I see in https://github.com/NVIDIA/FasterTransformer/pull/616. Thanks.