server
server copied to clipboard
Cant build python+onnx+ternsorrtllm backends r24.04
Im trying https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/compose.md to build onnx+python+tensorrtllm backends.
as mention in doc i do
git clone --single-branch --depth=1 -b r24.04 https://github.com/triton-inference-server/server.git
python3 compose.py --backend onnxruntime --backend python --repoagent checksum --image min,nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3 --image full,nvcr.io/nvidia/tritonserver:24.04-py3
and it builds, but when i start triton server.
E0517 12:18:34.314931 164 model_lifecycle.cc:638] failed to load 'llama3_tensorrt_llm' version 1: Invalid argument: unable to find backend library for backend 'tensorrtllm', try specifying runtime on the mode configuration.
models with python, onnx loads correct.
How i can combine docker image for using both backends?
python3 compose.py --backend tensorrtllm --backend python --backend onnxruntime --repoagent checksum --container-version 24.04
failed, tensorrt not found
=> CACHED [stage-1 16/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/include include/ 0.0s
=> CACHED [stage-1 17/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/python /opt/tritonserver/backends/python 0.0s
=> CACHED [stage-1 18/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/onnxruntime /opt/tritonserver/backends/onnxruntime 0.0s
=> ERROR [stage-1 19/23] COPY --chown=1000:1000 --from=full /opt/tritonserver/backends/tensorrtllm /opt/tritonserver/backends/tensorrtllm
Tracking ticket: [DLIS-6397]
Hi @gulldan, compose.py doesn't currently support the TensorRT-LLM backend (DLIS-6397).
You should be able to achieve something similar by using build.py with:
--backend tensorrtllm:r24.04
--backend python:r24.04
--backend onnxruntime:r24.04
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/build.html#building-with-docker
Let us know if this helps for your use case.
thank you.
i tried
./build.py --backend tensorrtllm:r24.04 --backend python:r24.04 --backend onnxruntime:r24.04 --enable-gpu --build-type Release --target-platform linux --endpoint grpc --endpoint http
but its failed build_log.txt
Host info Linux 6.5.0-35-generic #35-Ubuntu SMP PREEMPT_DYNAMIC Fri Apr 26 11:23:57 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Docker version 26.1.2, build 211e74b cmake version 3.28.4 python 3.11.6 GeForce RTX 4090 Driver Version: 550.54.15