D-FINE trtexec command gives batch_size 1 and loss of precision

Current trtexec command shown in the repo sets batch_size=1 even though onnx model is dynamic batch sized. I found this command helps export with dynamic batch size.

trtexec --onnx=dfine_x_obj2coco.onnx \
--saveEngine=dfine_x_obj2coco.trt \
--minShapes=images:1x3x640x640,orig_target_sizes:1x2 \
--optShapes=images:8x3x640x640,orig_target_sizes:8x2 \
--maxShapes=images:16x3x640x640,orig_target_sizes:16x2 \
--fp16

Let me know if something wrong with above command otherwise please update README.

I reduced onnx export dummy input size to 16 due to RAM issues for me. You can increase opt and max Shapes for 32 input.

Nov 26 '24 11:11 ganessh22

I have a quick question. when I deploy onnx to tensorrt, the command using trtexec raise error like 'segmantation fault'. Could you tell me the version of tenssorrt?

Nov 26 '24 22:11 Wooho-Moon

Try out a docker image like nvcr.io/nvidia/tensorrt:24.09-py3. @Wooho-Moon The repo mentions they used 10.4.0 which is in the above mentioned docker image. Might be some issue with your setup currently, maybe a mismatch between CUDA and TensorRT or something.

Nov 27 '24 18:11 ganessh22

Thanks to reply. Why I have a question is that I successed to convert onnx to tensorrt on dfine-n, but I failed to covert one on dfine-s..

Nov 27 '24 22:11 Wooho-Moon

I was also facing loss of precision. Have fixed it with this script based on ideas from other issues

import tensorrt as trt

TRT_LOGGER = trt.Logger(trt.Logger.WARNING)

def convert_onnx_to_trt(onnx_path, engine_path, batch_size=1, precision="fp16"):
    # Initialize TensorRT stuff
    TRT_LOGGER = trt.Logger(trt.Logger.WARNING)
    builder = trt.Builder(TRT_LOGGER)
    network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
    
    # Parse ONNX model
    parser = trt.OnnxParser(network, TRT_LOGGER)
    with open(onnx_path, 'rb') as model:
        parser.parse(model.read())
        
    for idx in range(parser.num_errors):
        print(parser.get_error(idx))

    # Create optimization profile
    profile = builder.create_optimization_profile()
    profile.set_shape(
        "images",
        min=(1, 3, 640, 640),
        opt=(batch_size, 3, 640, 640),
        max=(batch_size, 3, 640, 640)
    )
    profile.set_shape(
        "orig_target_sizes",
        min=(1, 2),
        opt=(batch_size, 2),
        max=(batch_size, 2)
    )

    # Configure builder
    config = builder.create_builder_config()
    config.max_workspace_size = 2 << 30  # 2GB
    config.add_optimization_profile(profile)
    if precision == "fp16":
        config.set_flag(trt.BuilderFlag.FP16)
        config.clear_flag(trt.BuilderFlag.OBEY_PRECISION_CONSTRAINTS)
        config.set_flag(trt.BuilderFlag.PREFER_PRECISION_CONSTRAINTS)
        for layer_idx in range(network.num_layers):
            layer = network[layer_idx]
            if layer.type == trt.LayerType.NORMALIZATION:
                layer.precision = trt.float32
                layer.set_output_type(0, trt.float32)

    config.set_flag(trt.BuilderFlag.STRICT_TYPES)

    # Build and save engine
    engine = builder.build_engine(network, config)
    with open(engine_path, 'wb') as f:
        f.write(engine.serialize())

    print(f"Successfully converted model to TensorRT engine: {engine_path}")


if __name__ == "__main__":
    # Example usage
    convert_onnx_to_trt(
        onnx_path="dfine_x_obj2coco.onnx",
        engine_path="dfine_x_obj2coco.engine",
        batch_size=1,
        precision="fp16"
    )

Dec 26 '24 08:12 ganessh22