TensorRT-LLM
TensorRT-LLM copied to clipboard
Question: Is a GPU required to build TensorRT-LLM Backend for Triton server
Is a GPU required to build TensorRT-LLM Backend for Triton Server?
I'm trying to build the TensorRT-LLM backend and I'm running the following command to build the wheel
python3 ../tensorrt_llm/scripts/build_wheel.py --trt_root ${TRT_ROOT} -D "CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-12.3/" -D "ENABLE_MULTI_DEVICE=1" -D "BUILD_TESTS:BOOL=OFF" -D "BUILD_BENCHMARKS:BOOL=OFF"
This is giving me the error
2024/07/19 21:28:25 INFO Requirement already satisfied: pybind11-stubgen in /usr/lib/python3.11/site-packages (2.5.1)
2024/07/19 21:28:25 WARN Traceback (most recent call last):
2024/07/19 21:28:25 WARN File "<frozen runpy>", line 198, in _run_module_as_main
2024/07/19 21:28:25 WARN File "<frozen runpy>", line 88, in _run_code
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/pybind11_stubgen/__main__.py", line 4, in <module>
2024/07/19 21:28:25 WARN main()
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/pybind11_stubgen/__init__.py", line 319, in main
2024/07/19 21:28:25 WARN run(
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/pybind11_stubgen/__init__.py", line 358, in run
2024/07/19 21:28:25 WARN QualifiedName.from_str(module_name), importlib.import_module(module_name)
2024/07/19 21:28:25 WARN ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/importlib/__init__.py", line 126, in import_module
2024/07/19 21:28:25 WARN return _bootstrap._gcd_import(name[level:], package, level)
2024/07/19 21:28:25 WARN ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
2024/07/19 21:28:25 WARN File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
2024/07/19 21:28:25 WARN File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
2024/07/19 21:28:25 WARN File "<frozen importlib._bootstrap>", line 676, in _load_unlocked
2024/07/19 21:28:25 WARN File "<frozen importlib._bootstrap>", line 573, in module_from_spec
2024/07/19 21:28:25 WARN File "<frozen importlib._bootstrap_external>", line 1233, in create_module
2024/07/19 21:28:25 WARN File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
2024/07/19 21:28:25 WARN ImportError: libcuda.so.1: cannot open shared object file: No such file or directory
2024/07/19 21:28:25 WARN Failed to build pybind11 stubgen: Command '"/usr/bin/python3" -m pybind11_stubgen -o . bindings' returned non-zero exit status 1.
2024/07/19 21:28:25 INFO * Building wheel...
2024/07/19 21:28:25 WARN Traceback (most recent call last):
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 373, in <module>
2024/07/19 21:28:25 WARN main()
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 357, in main
2024/07/19 21:28:25 WARN json_out["return_val"] = hook(**hook_input["kwargs"])
2024/07/19 21:28:25 WARN ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/pyproject_hooks/_in_process/_in_process.py", line 271, in build_wheel
2024/07/19 21:28:25 WARN return _build_backend().build_wheel(
2024/07/19 21:28:25 WARN ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/setuptools/build_meta.py", line 415, in build_wheel
2024/07/19 21:28:25 WARN return self._build_with_temp_dir(
2024/07/19 21:28:25 WARN ^^^^^^^^^^^^^^^^^^^^^^^^^^
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/setuptools/build_meta.py", line 397, in _build_with_temp_dir
2024/07/19 21:28:25 WARN self.run_setup()
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/setuptools/build_meta.py", line 497, in run_setup
2024/07/19 21:28:25 WARN super().run_setup(setup_script=setup_script)
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/site-packages/setuptools/build_meta.py", line 313, in run_setup
2024/07/19 21:28:25 WARN exec(code, locals())
2024/07/19 21:28:25 WARN File "<string>", line 84, in <module>
2024/07/19 21:28:25 WARN File "<string>", line 48, in sanity_check
2024/07/19 21:28:25 WARN ImportError: The `bindings` module does not exist. Please check the package integrity. If you are attempting to use the pip development mode (editable installation), please execute `build_wheels.py` first, and then run `pip install -e .`.
2024/07/19 21:28:25 INFO
2024/07/19 21:28:25 INFO ERROR Backend subprocess exited when trying to invoke build_wheel
2024/07/19 21:28:25 WARN Traceback (most recent call last):
2024/07/19 21:28:25 WARN File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 352, in <module>
2024/07/19 21:28:25 WARN main(**vars(args))
2024/07/19 21:28:25 WARN File "/home/build/backend/build/../tensorrt_llm/scripts/build_wheel.py", line 280, in main
2024/07/19 21:28:25 WARN build_run(
2024/07/19 21:28:25 WARN File "/usr/lib/python3.11/subprocess.py", line 571, in run
2024/07/19 21:28:25 WARN raise CalledProcessError(retcode, process.args,
2024/07/19 21:28:25 WARN subprocess.CalledProcessError: Command '"/usr/bin/python3" -m build /home/build/backend/tensorrt_llm --skip-dependency-check --no-isolation --wheel --outdir "/home/build/backend/tensorrt_llm/build"' returned non-zero
If I understand correctly libcuda.so.1 is provided by the GPU driver (reference). I don't have a GPU driver installed because I'm not running on a machine with a GPU.
I was hoping I didn't need a GPU just to build the TensorRTLLM backend. Are there some options that would change this?
I see an option BUILD_PYBIND. Can I disable the python bindings for the Runtime and Batch Manager if this is just for serving with Triton? https://github.com/NVIDIA/TensorRT-LLM/blob/5ddb6bf218ed16a2dcf0058f20c59a247e180fd2/cpp/CMakeLists.txt#L31
I see the docs describe the cpp_only option more.
However, I'm still confused about
- When are the python bindings required? Are they only required when building LLM engines? Or are they needed to deploy them to?
- Per my original question, are the drivers required to build the bindings
I was able to build the tensorrtllm_backend by following the directions for building it in a container. I believe that invokes the same build_wheel.py I was running above. Which would indicate it can successfully be built without a GPU.
I checked the resulting image and found the following libraries inside the image.
/usr/local/lib/python3.10/dist-packages/nvidia/cuda_runtime/lib/libcudart.so.12
/usr/local/cuda-12.4/compat/lib.real/libcuda.so.550.54.14
/usr/local/cuda-12.4/compat/lib.real/libcudadebugger.so.1
/usr/local/cuda-12.4/compat/lib.real/libcuda.so.1
/usr/local/cuda-12.4/compat/lib.real/libcudadebugger.so.550.54.14
/usr/local/cuda-12.4/compat/lib.real/libcuda.so
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-12.4/targets/x86_64-linux/lib/cmake/libcudacxx
/usr/local/cuda-12.4/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-config-version.cmake
/usr/local/cuda-12.4/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-config.cmake
/usr/local/cuda-12.4/targets/x86_64-linux/lib/cmake/libcudacxx/libcudacxx-header-search.cmake
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudadevrt.a
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so.12.4.99
/usr/local/cuda-12.4/targets/x86_64-linux/lib/stubs/libcuda.so
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart.so.12
/usr/local/cuda-12.4/targets/x86_64-linux/lib/libcudart_static.a
/usr/local/cuda-12.4/extras/Debugger/include/libcudacore.h
/usr/local/cuda-12.4/extras/Debugger/lib64/libcudacore.a
So I think the problem is that I'm missing libcuda.so.1. Anyone know where that library comes from? Is it the user space driver?
This dated post implies the cuda compatibility package is a separate package
I resolved this by obtaining the CUDA compatibility libs from the driver https://developer.download.nvidia.com/compute/cuda/redist/nvidia_driver/linux-x86_64/nvidia_driver-linux-x86_64-545.23.08-archive.tar.xz