TensorRT-LLM Question: Is a GPU required to build TensorRT-LLM Backend for Triton server

Is a GPU required to build TensorRT-LLM Backend for Triton Server?

I'm trying to build the TensorRT-LLM backend and I'm running the following command to build the wheel

python3 ../tensorrt_llm/scripts/build_wheel.py --trt_root ${TRT_ROOT} -D "CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.3/" -D "ENABLE_MULTI_DEVICE=1" -D  "BUILD_TESTS:BOOL=OFF" -D  "BUILD_BENCHMARKS:BOOL=OFF"

This is giving me the error

2024/07/19 21:28:25 INFO Requirement already satisfied: pybind11-stubgen in /usr/lib/python3.11/site-packages (2.5.1)
2024/07/19 21:28:25 WARN Traceback (most recent call last):
2024/07/19 21:28:25 WARN   File "<frozen runpy>", line 198, in _run_module_as_main
2024/07/19 21:28:25 WARN   File "<frozen runpy>", line 88, in _run_code
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/pybind11_stubgen/__main__.py", line 4, in <module>
2024/07/19 21:28:25 WARN     main()
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/pybind11_stubgen/__init__.py", line 319, in main
2024/07/19 21:28:25 WARN     run(
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/pybind11_stubgen/__init__.py", line 358, in run
2024/07/19 21:28:25 WARN     QualifiedName.from_str(module_name), importlib.import_module(module_name)
2024/07/19 21:28:25 WARN                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/importlib/__init__.py", line 126, in import_module
2024/07/19 21:28:25 WARN     return _bootstrap._gcd_import(name[level:], package, level)
2024/07/19 21:28:25 WARN            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN   File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
2024/07/19 21:28:25 WARN   File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
2024/07/19 21:28:25 WARN   File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
2024/07/19 21:28:25 WARN   File "<frozen importlib._bootstrap>", line 676, in _load_unlocked
2024/07/19 21:28:25 WARN   File "<frozen importlib._bootstrap>", line 573, in module_from_spec
2024/07/19 21:28:25 WARN   File "<frozen importlib._bootstrap_external>", line 1233, in create_module
2024/07/19 21:28:25 WARN   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
2024/07/19 21:28:25 WARN ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
2024/07/19 21:28:25 WARN Failed to build pybind11 stubgen: Command '"/usr/bin/python3" -m pybind11_stubgen -o . bindings' returned non-zero exit status 1.
2024/07/19 21:28:25 INFO * Building wheel...
2024/07/19 21:28:25 WARN Traceback (most recent call last):
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in <module>
2024/07/19 21:28:25 WARN     main()
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 357, in main
2024/07/19 21:28:25 WARN     json_out["return_val"] = hook(**hook_input["kwargs"])
2024/07/19 21:28:25 WARN                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 271, in build_wheel
2024/07/19 21:28:25 WARN     return _build_backend().build_wheel(
2024/07/19 21:28:25 WARN            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/setuptools/build_meta.py", line 415, in build_wheel
2024/07/19 21:28:25 WARN     return self._build_with_temp_dir(
2024/07/19 21:28:25 WARN            ^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/setuptools/build_meta.py", line 397, in _build_with_temp_dir
2024/07/19 21:28:25 WARN     self.run_setup()
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/setuptools/build_meta.py", line 497, in run_setup
2024/07/19 21:28:25 WARN     super().run_setup(setup_script=setup_script)
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/site-packages/setuptools/build_meta.py", line 313, in run_setup
2024/07/19 21:28:25 WARN     exec(code, locals())
2024/07/19 21:28:25 WARN   File "<string>", line 84, in <module>
2024/07/19 21:28:25 WARN   File "<string>", line 48, in sanity_check
2024/07/19 21:28:25 WARN ImportError: The `bindings` module does not exist. Please check the package integrity. If you are attempting to use the pip development mode (editable installation), please execute `build_wheels.py` first, and then run `pip install -e .`.
2024/07/19 21:28:25 INFO
2024/07/19 21:28:25 INFO ERROR Backend subprocess exited when trying to invoke build_wheel
2024/07/19 21:28:25 WARN Traceback (most recent call last):
2024/07/19 21:28:25 WARN   File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 352, in <module>
2024/07/19 21:28:25 WARN     main(**vars(args))
2024/07/19 21:28:25 WARN   File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 280, in main
2024/07/19 21:28:25 WARN     build_run(
2024/07/19 21:28:25 WARN   File "/usr/lib/python3.11/subprocess.py", line 571, in run
2024/07/19 21:28:25 WARN     raise CalledProcessError(retcode, process.args,
2024/07/19 21:28:25 WARN subprocess.CalledProcessError: Command '"/usr/bin/python3" -m build /home/build/backend/tensorrt_llm --skip-dependency-check --no-isolation --wheel --outdir "/home/build/backend/tensorrt_llm/build"' returned non-zero

If I understand correctly libcuda.so.1 is provided by the GPU driver (reference). I don't have a GPU driver installed because I'm not running on a machine with a GPU.

I was hoping I didn't need a GPU just to build the TensorRTLLM backend. Are there some options that would change this?

I see an option BUILD_PYBIND. Can I disable the python bindings for the Runtime and Batch Manager if this is just for serving with Triton? https://github.com/NVIDIA/TensorRT-LLM/blob/5ddb6bf218ed16a2dcf0058f20c59a247e180fd2/cpp/CMakeLists.txt#L31

Jul 19 '24 22:07 jlewi

I see the docs describe the cpp_only option more.

However, I'm still confused about

When are the python bindings required? Are they only required when building LLM engines? Or are they needed to deploy them to?
Per my original question, are the drivers required to build the bindings

Jul 19 '24 22:07 jlewi

I was able to build the tensorrtllm_backend by following the directions for building it in a container. I believe that invokes the same build_wheel.py I was running above. Which would indicate it can successfully be built without a GPU.

I checked the resulting image and found the following libraries inside the image.

/usr/local/lib/python3.10/dist-packages/nvidia/cuda_runtime/lib/libcudart.so.12
/usr/local/cuda-12.4/compat/lib.real/libcuda.so.550.54.14
/usr/local/cuda-12.4/compat/lib.real/libcudadebugger.so.1
/usr/local/cuda-12.4/compat/lib.real/libcuda.so.1
/usr/local/cuda-12.4/compat/lib.real/libcudadebugger.so.550.54.14
/usr/local/cuda-12.4/compat/lib.real/libcuda.so
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-12.4/targets/x86_64-linux/lib/cmake/libcudacxx
/usr/local/cuda-12.4/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-config-version.cmake
/usr/local/cuda-12.4/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-config.cmake
/usr/local/cuda-12.4/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-header-search.cmake
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudadevrt.a
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so.12.4.99
/usr/local/cuda-12.4/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so.12
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-12.4/extras/Debugger/include/libcudacore.h
/usr/local/cuda-12.4/extras/Debugger/lib64/libcudacore.a

Jul 25 '24 05:07 jlewi

So I think the problem is that I'm missing libcuda.so.1. Anyone know where that library comes from? Is it the user space driver?

This dated post implies the cuda compatibility package is a separate package

Jul 28 '24 21:07 jlewi

I resolved this by obtaining the CUDA compatibility libs from the driver https://developer.download.nvidia.com/compute/cuda/redist/nvidia_driver/linux-x86_64/nvidia_driver-linux-x86_64-545.23.08-archive.tar.xz

Aug 07 '24 18:08 jlewi

TensorRT-LLM TensorRT-LLM copied to clipboard

Question: Is a GPU required to build TensorRT-LLM Backend for Triton server

TensorRT-LLM
TensorRT-LLM copied to clipboard