trtexec command gives batch_size 1 and loss of precision
Current trtexec command shown in the repo sets batch_size=1 even though onnx model is dynamic batch sized. I found this command helps export with dynamic batch size.
trtexec --onnx=dfine_x_obj2coco.onnx \
--saveEngine=dfine_x_obj2coco.trt \
--minShapes=images:1x3x640x640,orig_target_sizes:1x2 \
--optShapes=images:8x3x640x640,orig_target_sizes:8x2 \
--maxShapes=images:16x3x640x640,orig_target_sizes:16x2 \
--fp16
Let me know if something wrong with above command otherwise please update README.
I reduced onnx export dummy input size to 16 due to RAM issues for me. You can increase opt and max Shapes for 32 input.
I have a quick question. when I deploy onnx to tensorrt, the command using trtexec raise error like 'segmantation fault'. Could you tell me the version of tenssorrt?
Try out a docker image like nvcr.io/nvidia/tensorrt:24.09-py3. @Wooho-Moon The repo mentions they used 10.4.0 which is in the above mentioned docker image. Might be some issue with your setup currently, maybe a mismatch between CUDA and TensorRT or something.
Thanks to reply. Why I have a question is that I successed to convert onnx to tensorrt on dfine-n, but I failed to covert one on dfine-s..
I was also facing loss of precision. Have fixed it with this script based on ideas from other issues
import tensorrt as trt
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
def convert_onnx_to_trt(onnx_path, engine_path, batch_size=1, precision="fp16"):
# Initialize TensorRT stuff
TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(TRT_LOGGER)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
# Parse ONNX model
parser = trt.OnnxParser(network, TRT_LOGGER)
with open(onnx_path, 'rb') as model:
parser.parse(model.read())
for idx in range(parser.num_errors):
print(parser.get_error(idx))
# Create optimization profile
profile = builder.create_optimization_profile()
profile.set_shape(
"images",
min=(1, 3, 640, 640),
opt=(batch_size, 3, 640, 640),
max=(batch_size, 3, 640, 640)
)
profile.set_shape(
"orig_target_sizes",
min=(1, 2),
opt=(batch_size, 2),
max=(batch_size, 2)
)
# Configure builder
config = builder.create_builder_config()
config.max_workspace_size = 2 << 30 # 2GB
config.add_optimization_profile(profile)
if precision == "fp16":
config.set_flag(trt.BuilderFlag.FP16)
config.clear_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
for layer_idx in range(network.num_layers):
layer = network[layer_idx]
if layer.type == trt.LayerType.NORMALIZATION:
layer.precision = trt.float32
layer.set_output_type(0, trt.float32)
config.set_flag(trt.BuilderFlag.STRICT_TYPES)
# Build and save engine
engine = builder.build_engine(network, config)
with open(engine_path, 'wb') as f:
f.write(engine.serialize())
print(f"Successfully converted model to TensorRT engine: {engine_path}")
if __name__ == "__main__":
# Example usage
convert_onnx_to_trt(
onnx_path="dfine_x_obj2coco.onnx",
engine_path="dfine_x_obj2coco.engine",
batch_size=1,
precision="fp16"
)