server
server copied to clipboard
TensorRT 8.6.3.1 package in Python PyPy for Triton Nvidia Inference Server version > 24.01
Description
For Triton Nvidia Inference Server version bigger than 24.01
(started with 24.02
) the supported version of tensorrt is 8.6.3.1
. I am using tensorrt python package and script to convert onnx
weights to trt
engine, but the last available version in pypy is 8.6.1.6
and because of this I can't use tensorrt_backend
in triton and got this error:
The engine plan file is not compatible with this version of TensorRT,
expecting library version 8.6.3.1 got 8.6.1.6, please rebuild.
Is it possible to upload this package version (8.6.3.1
) to pypy? Or how can I rewrite this script using other tools?
import tensorrt as trt
explicit_batch_flag = 1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)
max_batch_size: int = 32
max_workspace_size: int = 1 << 30
logger = trt.Logger(trt.Logger.INFO)
with (
trt.Builder(logger) as builder,
builder.create_network(explicit_batch_flag) as network,
trt.OnnxParser(network, logger) as parser
):
config = builder.create_builder_config()
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, max_workspace_size)
...
Triton Information
24.02-24.04, docker image (without additional building)
To Reproduce
-
pip install tensorrt==8.6.1.6
- run script with compiling onnx to tensorrt
- run triton with version > 24.01 (works on 24.01, but not 24.02)
Linked issue
Expected behavior
I can use tensorrt backend with correct version