onnxruntime [TensorRT EP] Support engine hardware compatibility

[TensorRT EP] Support engine hardware compatibility

Open yf711 opened this issue 9 months ago • 0 comments

Introduce option trt_engine_hw_compatible to support engine hardware compatibility for Ampere+ GPUs
- This enables nvinfer1::HardwareCompatibilityLevel::kAMPERE_PLUS flag when generating engines
- This option has been validated on sm80/86 GPUs, as engine can be reused across different ampere+ arch:
  - Client side need to enable this option as well to leverage existing sm80+ engines
- If this option is enabled by users which TRT<8.6 or sm<80, there will be a warning showing this option not supported

Engine naming:

When	`trt_engine_hw_compat=false`	`trt_engine_hw_compat=true`
A100 (sm80)	TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm80.engine	TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm80+.engine
RTX3080 (sm86)	TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm86.engine	TensorrtExecutionProvider_TRTKernel_graph_torch-jit-export_9454133937466702238_0_0_sm80+.engine

Reference: https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#hardware-compat

May 13 '24 17:05 yf711