TensorRT
TensorRT copied to clipboard
Inquiry about Layer Performance of FP16
Description
Hi, I'm newer to TensorRT and I'm trying to understand the layer performance. I read the doc Optimizing for Tensor Cores and see that with the FP16 precision, the dim of tensor should be multiples of 8 or 16. So I converted an ONNX model to an engine model, then I printed the layer information. Here is a part of it:
...
{
"Name": "/model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "/model.1/act/Relu_output_0",
"Location": "Device",
"Dimensions": [1,50,160,160],
"Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
}],
"Outputs": [
{
"Name": "Reformatted Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
"Location": "Device",
"Dimensions": [1,25,160,160],
"Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
}],
"ParameterType": "Convolution",
"Kernel": [1,1],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [0,0],
"PostPadding": [0,0],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 25,
"Groups": 1,
"Weights": {"Type": "Half", "Count": 1250},
"Bias": {"Type": "Half", "Count": 25},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x32x32_stage4_warpsize4x1x1_g1_tensor16x8x16_t1r1s1",
"TacticValue": "0xb4ed47991b2d81ae",
"StreamId": 0,
"Metadata": "[ONNX Layer: /model.2/cv1/conv/Conv]\u001e[ONNX Layer: /model.2/cv1/act/Relu]"
},{
"Name": "Reformatting CopyNode for Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
"LayerType": "Reformat",
"Inputs": [
{
"Name": "Reformatted Output Tensor 0 to /model.2/cv1/conv/Conv + /model.2/cv1/act/Relu",
"Location": "Device",
"Dimensions": [1,25,160,160],
"Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
}],
"Outputs": [
{
"Name": "/model.2/cv1/act/Relu_output_0",
"Location": "Device",
"Dimensions": [1,25,160,160],
"Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
}],
"ParameterType": "Reformat",
"Origin": "REFORMAT",
"TacticValue": "0x00000000000003ea",
"StreamId": 0,
"Metadata": ""
},{
"Name": "/model.2/m.0/cv1/conv/Conv + /model.2/m.0/cv1/act/Relu",
"LayerType": "CaskConvolution",
"Inputs": [
{
"Name": "/model.2/cv1/act/Relu_output_0",
"Location": "Device",
"Dimensions": [1,25,160,160],
"Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
}],
"Outputs": [
{
"Name": "/model.2/m.0/cv1/act/Relu_output_0",
"Location": "Device",
"Dimensions": [1,25,160,160],
"Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
}],
"ParameterType": "Convolution",
"Kernel": [3,3],
"PaddingMode": "kEXPLICIT_ROUND_DOWN",
"PrePadding": [1,1],
"PostPadding": [1,1],
"Stride": [1,1],
"Dilation": [1,1],
"OutMaps": 25,
"Groups": 1,
"Weights": {"Type": "Half", "Count": 5625},
"Bias": {"Type": "Half", "Count": 25},
"HasSparseWeights": 0,
"HasDynamicFilter": 0,
"HasDynamicBias": 0,
"HasResidual": 0,
"ConvXAsActInputIdx": -1,
"BiasAsActInputIdx": -1,
"ResAsActInputIdx": -1,
"Activation": "RELU",
"HasBias": 1,
"HasReLU": 1,
"TacticName": "sm80_xmma_fprop_implicit_gemm_indexed_wo_smem_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x32x64_stage1_warpsize4x1x1_g1_tensor16x8x16_aligna4_alignc4",
"TacticValue": "0xa1c540a5038e4190",
"StreamId": 0,
"Metadata": "[ONNX Layer: /model.2/m.0/cv1/conv/Conv]\u001e[ONNX Layer: /model.2/m.0/cv1/act/Relu]"
}
...
I see the description "Format/Datatype": "Channel major FP16 format where channel % 8 == 0"
and "Format/Datatype": "Channel major FP16 format where channel % 2 == 0"
. I don't know what this means because my channel is not divisible by 8 "Dimensions": [1,25,160,160]
, and is my model optimized?
Sorry for my bad English.
Environment
TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt
):