TensorRT Inquiry about Layer Performance of FP16

Inquiry about Layer Performance of FP16

Open minhhotboy9x opened this issue 9 months ago • 4 comments

Description

Hi, I'm newer to TensorRT and I'm trying to understand the layer performance. I read the doc Optimizing for Tensor Cores and see that with the FP16 precision, the dim of tensor should be multiples of 8 or 16. So I converted an ONNX model to an engine model, then I printed the layer information. Here is a part of it:

...
{
  "Name": "/model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "/model.1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,50,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "Outputs": [
  {
    "Name": "Reformatted Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [1,1],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [0,0],
  "PostPadding": [0,0],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 25,
  "Groups": 1,
  "Weights": {"Type": "Half", "Count": 1250},
  "Bias": {"Type": "Half", "Count": 25},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x32x32_stage4_warpsize4x1x1_g1_tensor16x8x16_t1r1s1",
  "TacticValue": "0xb4ed47991b2d81ae",
  "StreamId": 0,
  "Metadata": "[ONNX Layer: /model.2/cv1/conv/Conv]\u001e[ONNX Layer: /model.2/cv1/act/Relu]"
},{
  "Name": "Reformatting CopyNode for Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
  "LayerType": "Reformat",
  "Inputs": [
  {
    "Name": "Reformatted Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
  }],
  "Outputs": [
  {
    "Name": "/model.2/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "ParameterType": "Reformat",
  "Origin": "REFORMAT",
  "TacticValue": "0x00000000000003ea",
  "StreamId": 0,
  "Metadata": ""
},{
  "Name": "/model.2/m.0/cv1/conv/Conv + /model.2/m.0/cv1/act/Relu",
  "LayerType": "CaskConvolution",
  "Inputs": [
  {
    "Name": "/model.2/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "Outputs": [
  {
    "Name": "/model.2/m.0/cv1/act/Relu_output_0",
    "Location": "Device",
    "Dimensions": [1,25,160,160],
    "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
  }],
  "ParameterType": "Convolution",
  "Kernel": [3,3],
  "PaddingMode": "kEXPLICIT_ROUND_DOWN",
  "PrePadding": [1,1],
  "PostPadding": [1,1],
  "Stride": [1,1],
  "Dilation": [1,1],
  "OutMaps": 25,
  "Groups": 1,
  "Weights": {"Type": "Half", "Count": 5625},
  "Bias": {"Type": "Half", "Count": 25},
  "HasSparseWeights": 0,
  "HasDynamicFilter": 0,
  "HasDynamicBias": 0,
  "HasResidual": 0,
  "ConvXAsActInputIdx": -1,
  "BiasAsActInputIdx": -1,
  "ResAsActInputIdx": -1,
  "Activation": "RELU",
  "HasBias": 1,
  "HasReLU": 1,
  "TacticName": "sm80_xmma_fprop_implicit_gemm_indexed_wo_smem_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x32x64_stage1_warpsize4x1x1_g1_tensor16x8x16_aligna4_alignc4",
  "TacticValue": "0xa1c540a5038e4190",
  "StreamId": 0,
  "Metadata": "[ONNX Layer: /model.2/m.0/cv1/conv/Conv]\u001e[ONNX Layer: /model.2/m.0/cv1/act/Relu]"
}
...

I see the description "Format/Datatype": "Channel major FP16 format where channel % 8 == 0" and "Format/Datatype": "Channel major FP16 format where channel % 2 == 0". I don't know what this means because my channel is not divisible by 8 "Dimensions": [1,25,160,160], and is my model optimized?

Sorry for my bad English.

Environment

TensorRT Version:

NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

May 17 '24 11:05 minhhotboy9x

TensorRT TensorRT copied to clipboard

Inquiry about Layer Performance of FP16

Description

Environment

Relevant Files

Steps To Reproduce

TensorRT
TensorRT copied to clipboard