Is MPI required even multi device is disabled?
System Info
- CPU x86_64
Who can help?
No response
Information
- [ ] The official example scripts
- [ ] My own modified scripts
Tasks
- [ ] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
I'm trying to build the wheel as follows
python3 ../tensorrt_llm/scripts/build_wheel.py --trt_root ${TRT_ROOT} -D "CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.3/" -D "ENABLE_MULTI_DEVICE=0"
I end up with a linking error because MPI is missing.
[100%] Building CXX object tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/executorWorker.cpp.o
[100%] Linking CXX executable executorWorker
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_char'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Wait'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Mrecv'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_uint64_t'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Comm_spawn'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Get_count'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `tensorrt_llm::mpi::MpiComm::MpiComm(ompi_communicator_t*, bool)'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_comm_self'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Info_set'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_comm_world'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `tensorrt_llm::mpi::MpiComm::mprobe(int, int, ompi_message_t**, ompi_status_public_t*) const'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Info_create'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Barrier'
collect2: error: ld returned 1 exit status
make[3]: *** [tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/build.make:112: tensorrt_llm/executor_worker/executorWorker] Error 1
make[2]: *** [CMakeFiles/Makefile2:1192: tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:1199: tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/rule] Error 2
make: *** [Makefile:335: executorWorker] Error 2
Traceback (most recent call last):
File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 352, in <module>
main(**vars(args))
File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 166, in main
build_run(
File "/usr/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
I don't have MPI which is why I was disabling multi-device.
Expected behavior
I expect this to compile with out MPI being needed. My assumption was that MPI is only required for multi-device. That assumption could be incorrect. I was hoping to be able to compile for single device without needing MPI. Is MPI needed even for single device?
actual behavior
I got a linking error because MPI is missing
[100%] Building CXX object tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/executorWorker.cpp.o
[100%] Linking CXX executable executorWorker
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_char'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Wait'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Mrecv'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_uint64_t'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Comm_spawn'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Get_count'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `tensorrt_llm::mpi::MpiComm::MpiComm(ompi_communicator_t*, bool)'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_comm_self'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Info_set'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `ompi_mpi_comm_world'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `tensorrt_llm::mpi::MpiComm::mprobe(int, int, ompi_message_t**, ompi_status_public_t*) const'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Info_create'
/usr/lib/gcc/x86_64-pc-linux-gnu/12.4.0/../../../../x86_64-pc-linux-gnu/bin/ld: ../libtensorrt_llm.so: undefined reference to `MPI_Barrier'
collect2: error: ld returned 1 exit status
make[3]: *** [tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/build.make:112: tensorrt_llm/executor_worker/executorWorker] Error 1
make[2]: *** [CMakeFiles/Makefile2:1192: tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:1199: tensorrt_llm/executor_worker/CMakeFiles/executorWorker.dir/rule] Error 2
make: *** [Makefile:335: executorWorker] Error 2
Traceback (most recent call last):
File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 352, in <module>
main(**vars(args))
File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 166, in main
build_run(
File "/usr/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
additional notes
I also had to remove mpi4py from requirements.txt to try to get to build without MultiDevice support.
# N.B Hack: We remove mpi4py from the requirements because we don't have mpi libraries.
# Hopefully that should only be needed for multi device support
sed '/mpi4py/d' -i ../tensorrt_llm/requirements.txt
@Funatiq Could you please have a look? Thanks
Could you try with the following option to build_wheel.py
--extra-cmake-vars ENABLE_MULTI_DEVICE=0
I'm trying to build it now with openmpi. It takes such a long time to build that if I have success with OpenMPI I may not want to bother with rerunning the experiments.
Fair enough. If building for a specific target architecture, -a native can provide a significant build time reduction.
Its a Bug, I also have the issue.
The issue has disappeared in the latest main branch.