tensorrtllm_backend
tensorrtllm_backend copied to clipboard
unable to load shared library: libnvinfer_plugin_tensorrt_llm.so.9 using nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
System Info
CPU Architecture x86 A100 40GB
Who can help?
No response
Information
- [X] The official example scripts
- [ ] My own modified scripts
Tasks
- [X] An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - [ ] My own task or dataset (give details below)
Reproduction
- Follow the https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/docs/llama.md
- Use the image nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
- Try to start the backend. You will get the error
+ '[' 1 -eq 0 ']'
+ command=serve
+ export DATADIR=/data
+ DATADIR=/data
+ export TRTDIR=/data/git_TensorRT-LLM
+ TRTDIR=/data/git_TensorRT-LLM
+ export MIXTRALDIR=/data/git_mixtral-8x7B-v0.1
+ MIXTRALDIR=/data/git_mixtral-8x7B-v0.1
+ export OUTPUTDIR=/data/tllm_checkpoint_mixtral_2gpu
+ OUTPUTDIR=/data/tllm_checkpoint_mixtral_2gpu
+ LLAMA_UNIFIED_CKPT_PATH=/data/ckpt/llama/7b/
+ LLAMA_ENGINE_PATH=/data/engines/llama/7b/
+ HF_LLAMA_MODEL=/data/git_Llama-2-7b-hf
+ case $command in
+ echo 'Starting Triton server...'
Starting Triton server...
+ export TRTLLM_ORCHESTRATOR=1
+ TRTLLM_ORCHESTRATOR=1
+ export LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs
+ LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs
+ tritonserver --model-repository=/data/models/llama_ifb
W0723 23:46:59.440299 2084 pinned_memory_manager.cc:271] "Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version"
I0723 23:46:59.440364 2084 cuda_memory_manager.cc:117] "CUDA memory pool disabled"
E0723 23:46:59.440454 2084 server.cc:243] "CudaDriverHelper has not been initialized."
I0723 23:46:59.445778 2084 model_lifecycle.cc:472] "loading: postprocessing:1"
I0723 23:46:59.445846 2084 model_lifecycle.cc:472] "loading: preprocessing:1"
I0723 23:46:59.445986 2084 model_lifecycle.cc:472] "loading: tensorrt_llm:1"
I0723 23:46:59.446031 2084 model_lifecycle.cc:472] "loading: tensorrt_llm_bls:1"
E0723 23:46:59.448720 2084 model_lifecycle.cc:641] "failed to load 'tensorrt_llm' version 1: Not found: unable to load shared library: libnvinfer_plugin_tensorrt_llm.so.9: cannot open shared object file: No such file or directory"
I0723 23:46:59.448763 2084 model_lifecycle.cc:776] "failed to load 'tensorrt_llm'"
I0723 23:47:01.013657 2084 python_be.cc:2404] "TRITONBACKEND_ModelInstanceInitialize: tensorrt_llm_bls_0_0 (CPU device 0)"
I0723 23:47:01.228214 2084 model_lifecycle.cc:838] "successfully loaded 'tensorrt_llm_bls'"
I0723 23:47:02.833369 2084 python_be.cc:2404] "TRITONBACKEND_ModelInstanceInitialize: preprocessing_0_0 (CPU device 0)"
I0723 23:47:02.834550 2084 python_be.cc:2404] "TRITONBACKEND_ModelInstanceInitialize: postprocessing_0_0 (CPU device 0)"
[TensorRT-LLM][WARNING] Don't setup 'skip_special_tokens' correctly (set value is ${skip_special_tokens}). Set it as True by default.
[TensorRT-LLM][WARNING] Don't setup 'add_special_tokens' correctly (set value is ${add_special_tokens}). Set it as True by default.
I0723 23:47:04.829608 2084 model_lifecycle.cc:838] "successfully loaded 'postprocessing'"
I0723 23:47:04.845120 2084 model_lifecycle.cc:838] "successfully loaded 'preprocessing'"
E0723 23:47:04.845236 2084 model_repository_manager.cc:614] "Invalid argument: ensemble 'ensemble' depends on 'tensorrt_llm' which has no loaded version. Model 'tensorrt_llm' loading failed with error: version 1 is at UNAVAILABLE state: Not found: unable to load shared library: libnvinfer_plugin_tensorrt_llm.so.9: cannot open shared object file: No such file or directory;"
I0723 23:47:04.845354 2084 server.cc:606]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+
I0723 23:47:04.845398 2084 server.cc:633]
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------+
| Backend | Path | Config |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------+
| python | /opt/tritonserver/backends/python/libtriton_python.so | {"cmdline":{"auto-complete-config":"true","backend-directory":"/opt/tritonserver/backends","min- |
| | | compute-capability":"6.000000","default-max-batch-size":"4"}} |
+---------+-------------------------------------------------------+--------------------------------------------------------------------------------------------------+
I0723 23:47:04.845501 2084 server.cc:676]
+------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------+
| Model | Version | Status |
+------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------+
| postprocessing | 1 | READY |
| preprocessing | 1 | READY |
| tensorrt_llm | 1 | UNAVAILABLE: Not found: unable to load shared library: libnvinfer_plugin_tensorrt_llm.so.9: cannot open shared object file: No such f |
| | | ile or directory |
| tensorrt_llm_bls | 1 | READY |
+------------------+---------+---------------------------------------------------------------------------------------------------------------------------------------+
Error: Failed to initialize NVML
W0723 23:47:04.846718 2084 metrics.cc:798] "DCGM unable to start: DCGM initialization error"
I0723 23:47:04.846881 2084 metrics.cc:770] "Collecting CPU metrics"
I0723 23:47:04.847003 2084 tritonserver.cc:2557]
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.46.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_m |
| | emory cuda_shared_memory binary_tensor_data parameters statistics trace logging |
| model_repository_path[0] | /data/models/llama_ifb |
| model_control_mode | MODE_NONE |
| strict_model_config | 0 |
| model_config_name | |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------+
I0723 23:47:04.847113 2084 server.cc:307] "Waiting for in-flight requests to complete."
I0723 23:47:04.847134 2084 server.cc:323] "Timeout 30: Found 0 model versions that have in-flight inferences"
I0723 23:47:04.847833 2084 server.cc:338] "All models are stopped, unloading models"
I0723 23:47:04.847854 2084 server.cc:347] "Timeout 30: Found 3 live models and 0 in-flight non-inference requests"
I0723 23:47:05.847986 2084 server.cc:347] "Timeout 29: Found 3 live models and 0 in-flight non-inference requests"
Cleaning up...
Cleaning up...
Cleaning up...
I0723 23:47:05.877044 2084 model_lifecycle.cc:623] "successfully unloaded 'tensorrt_llm_bls' version 1"
I0723 23:47:06.213953 2084 model_lifecycle.cc:623] "successfully unloaded 'postprocessing' version 1"
I0723 23:47:06.215902 2084 model_lifecycle.cc:623] "successfully unloaded 'preprocessing' version 1"
I0723 23:47:06.848121 2084 server.cc:347] "Timeout 28: Found 0 live models and 0 in-flight non-inference requests"
error: creating server: Internal - failed to load all models
command terminated with exit code 1
Expected behavior
I expected the server to start.
actual behavior
I get an error
UNAVAILABLE: Not found: unable to load shared library: libnvinfer_plugin_tensorrt_llm.so.9: cannot open shared object file
additional notes
The library libnvinfer_plugin_tensorrt_llm.so is available
find / -name "libnvinfer_plugin_tensorrt_llm*"
find: ‘/proc/580/task/580/net’: Invalid argument
find: ‘/proc/580/net’: Invalid argument
find: ‘/proc/688/task/688/net’: Invalid argument
/usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so
find: ‘/proc/688/net’: Invalid argument
command terminated with exit code 1
I have set
export LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs
I was able to work around this by doing the following
-
Create a symbolic link
ln -s /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so /usr/lib/libnvinfer_plugin_tensorrt_llm.so.9 -
Set the LD_LIBRARY_PATH as follows
export LD_LIBRARY_PATH=/usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs:/usr/local/nvidia/lib64:/opt/tritonserver/backends/tensorrtllm:/opt/tritonserver/lib
If I didn't set the LD_LIBRARY_PATH I got errors about not being able to find a bunch of different libraries
- libcuda.so.1
- libtriton_tensorrtllm_common.so
- libtritonserver.so
I'm running on GKE. I believe libcuda.so.1 is provided by the driver and gets installed on the host. That might explain why it gets installed in a location. that the triton server image doesn't know about and requires explicit configuration. I'm not sure about the others.
this seems to be solved by adding
RUN ldconfig