TensorRT
TensorRT copied to clipboard
BF16 is slower than fp16 of TensorRT 9.1 when running my R50 model on A800 GPU
Description
Environment
TensorRT Version: TensorRT-9.1.0.4 NVIDIA GPU: A800,3080 NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System: Ubuntu 18.04.5 LTS \n \l Python Version (if applicable): python3.8 Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)
model_path = "./prune_model.onnx"
precision = "fp16"
success = parser.parse_from_file(model_path)
config = builder.create_builder_config()
if precision == "fp16":
config.set_flag(trt.BuilderFlag.FP16)
elif precision == "bf16":
config.set_flag(trt.BuilderFlag.BF16)
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 35)
profile = builder.create_optimization_profile()
input_shape = [6, 465, 720, 3]
profile.set_shape("stack_0.tmp_0", input_shape, input_shape, input_shape)
config.add_optimization_profile(profile)
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED
engine_file_path = "engine_file_path_" + precision
if os.path.exists(engine_file_path):
with open(engine_file_path, "rb") as f, trt.Runtime(logger) as runtime:
engine = runtime.deserialize_cuda_engine(f.read())
else:
serialized_engine = builder.build_serialized_network(network, config)
runtime = trt.Runtime(logger)
engine = runtime.deserialize_cuda_engine(serialized_engine)
print("save engine for later use.")
with open(engine_file_path, "wb") as f:
f.write(engine.serialize())
context = engine.create_execution_context()
context.set_binding_shape(0, input_shape)
h_input0 = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)),dtype=np.float32)
h_input0 = np.zeros(h_input0.shape).astype(np.float32)
h_output =cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)),dtype=np.float32)
d_input0 = cuda.mem_alloc(h_input0.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)
stream = cuda.Stream()
cuda.memcpy_htod_async(d_input0, h_input0, stream)
context.execute_async(bindings=[int(d_input0), int(d_output)], stream_handle=stream.handle)
cuda.memcpy_dtoh_async(h_output, d_output, stream)
stream.synchronize()
import datetime
import time
stream.synchronize()
starttime = datetime.datetime.now()
for i in range(10):
cuda.memcpy_htod_async(d_input0, h_input0, stream)
context.execute_async(bindings=[int(d_input0), int(d_output)], stream_handle=stream.handle)
cuda.memcpy_dtoh_async(h_output, d_output, stream)
stream.synchronize()
endtime = datetime.datetime.now()
duringtime = endtime - starttime
print (duringtime.seconds * 1000 + duringtime.microseconds / 1000.0)# 单位是毫
# fp32 is : 41.957417 -35.9053, 81.156 ms
# bf16 is : 41.957417 -35.9053, 83.132 ms
# fp16 is : 41.98418 -35.900158, 53.892 ms
print(np.std(h_output), np.mean(h_output))
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):
Description
Environment
TensorRT Version:
NVIDIA GPU:
NVIDIA Driver Version:
CUDA Version:
CUDNN Version:
Operating System:
Python Version (if applicable):
Tensorflow Version (if applicable):
PyTorch Version (if applicable):
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt):
@nvpohanh I guess it's expected since we have more optimized kernel for FP16, am I right?
Yes, our current BF16 optimizations focus more on Transformers (like LLMs) rather than ConvNets. However, I think this is still something we want to improve in the future. @zerollzeng Could you repro and file an internal tracker? Thanks
Hi @zerollzeng @nvpohanh , Our customers are trying to use BF16 precision to reduce accuracy drop but they encounter a perf gap. The following logs are the trtexec information for the model above with BF16 and FP16 precisions. I found that even BF16 flag is set, the chosen kernels for convolution are still in FP32 precision. Also, in building stage, no BF16 convolution kernel is tested as a possible choice (the optimization level is set to be 5). Could you please look into this issue? Thanks!
BF16:
[01/09/2024-07:11:19] [I] Layers:
Name: assign_0.tmp_0, LayerType: Constant, Inputs: [], Outputs: [ { Name: (Unnamed Layer* 1) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Constant, weights: {"Type": "Float", "Count": 3}, dimensions: [1,3,1,1], TacticValue: 0x0000000000000000, StreamId: 0, Metadata:
Name: assign_1.tmp_0, LayerType: Constant, Inputs: [], Outputs: [ { Name: (Unnamed Layer* 3) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Constant, weights: {"Type": "Float", "Count": 3}, dimensions: [1,3,1,1], TacticValue: 0x0000000000000000, StreamId: 0, Metadata:
Name: p2o.Transpose.0, LayerType: Shuffle, Inputs: [ { Name: stack_0.tmp_0, Location: Device, Dimensions: [6,465,720,3], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: transpose_0.tmp_0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], ParameterType: Shuffle, FirstTranspose: [0,3,1,2], Reshape: "nbDims=-1", SecondTranspose: [0,1,2,3], ZeroIsPlaceholder: 1, TacticValue: 0x0000000000000000, StreamId: 0, Metadata: [ONNX Layer: p2o.Transpose.0]
Name: PWN(PWN(p2o.Sub.0), PWN(p2o.Div.0)), LayerType: PointWiseV2, Inputs: [ { Name: transpose_0.tmp_0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }, { Name: (Unnamed Layer* 1) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }, { Name: (Unnamed Layer* 3) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: p2o.Div.1, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], ParameterType: PointWise, ParameterSubType: PointWiseExpression, NbInputArgs: 3, InputArgs: ["arg0", "arg1", "arg2"], NbOutputVars: 1, OutputVars: ["var1"], NbParams: 0, Params: [], NbLiterals: 0, Literals: [], NbOperations: 2, Operations: ["auto const var0 = pwgen::iMinus(arg0, arg1);", "auto const var1 = pwgen::iDiv(var0, arg2);"], TacticValue: 0x0000000000000009, StreamId: 0, Metadata: [ONNX Layer: p2o.Sub.0][ONNX Layer: p2o.Div.0]
Name: Reformatting CopyNode for Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, LayerType: Reformat, Inputs: [ { Name: p2o.Div.1, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: Reformatted Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x00000000000003ea, StreamId: 0, Metadata:
Name: p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, LayerType: CaskConvolution, Inputs: [ { Name: Reformatted Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_0.tmp_0, Location: Device, Dimensions: [6,64,240,368], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [7,7], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [3,3], PostPadding: [18,19], Stride: [2,2], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 9408}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_indexed_wo_smem_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x32x16_stage1_warpsize4x1x1_g1_tensor16x8x8, TacticValue: 0x9cb304e2edbc1221, StreamId: 0, Metadata: [ONNX Layer: p2o.Pad.0][ONNX Layer: p2o.Conv.0][ONNX Layer: p2o.BatchNormalization.0][ONNX Layer: p2o.Relu.0]
Name: p2o.MaxPool.0, LayerType: CaskPooling, Inputs: [ { Name: relu_0.tmp_0, Location: Device, Dimensions: [6,64,240,368], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Pooling, PoolingType: MAX, WindowSize: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], BlendFactor: 0, AverageCountExcludesPadding: 1, TacticName: sm50_xmma_pooling_max_nhwc_FP32FP32_WINDOWSIZE_3_NOT_PROPAGATE_NAN_2D, TacticValue: 0x789b2859f2e03e79, StreamId: 0, Metadata: [ONNX Layer: p2o.MaxPool.0]
Name: p2o.Conv.1 + p2o.BatchNormalization.1 + p2o.Relu.1, LayerType: CaskGemmConvolution, Inputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_1.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 4096}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x64x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x000000000002058d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.1][ONNX Layer: p2o.BatchNormalization.1][ONNX Layer: p2o.Relu.1]
Name: p2o.Conv.2 + p2o.BatchNormalization.2 + p2o.Relu.2, LayerType: CaskConvolution, Inputs: [ { Name: relu_1.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_2.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xa9a06d0633580c0c, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.2][ONNX Layer: p2o.BatchNormalization.2][ONNX Layer: p2o.Relu.2]
Name: p2o.Conv.3 + p2o.BatchNormalization.3, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_2.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: batch_norm_3.tmp_2, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize128x64x16_stage6_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020741, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.3][ONNX Layer: p2o.BatchNormalization.3]
Name: p2o.Conv.4 + p2o.BatchNormalization.4 + p2o.Add.0 + p2o.Relu.3, LayerType: CaskConvolution, Inputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: batch_norm_3.tmp_2, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1, TacticValue: 0x9dece0dc37e90462, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.4][ONNX Layer: p2o.BatchNormalization.4][ONNX Layer: p2o.Add.0][ONNX Layer: p2o.Relu.3]
Name: p2o.Conv.5 + p2o.BatchNormalization.5 + p2o.Relu.4, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_4.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x64x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x000000000002058d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.5][ONNX Layer: p2o.BatchNormalization.5][ONNX Layer: p2o.Relu.4]
Name: p2o.Conv.6 + p2o.BatchNormalization.6 + p2o.Relu.5, LayerType: CaskConvolution, Inputs: [ { Name: relu_4.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_5.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xa9a06d0633580c0c, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.6][ONNX Layer: p2o.BatchNormalization.6][ONNX Layer: p2o.Relu.5]
Name: p2o.Conv.7 + p2o.BatchNormalization.7 + p2o.Add.2 + p2o.Relu.6, LayerType: CaskConvolution, Inputs: [ { Name: relu_5.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1, TacticValue: 0x9dece0dc37e90462, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.7][ONNX Layer: p2o.BatchNormalization.7][ONNX Layer: p2o.Add.2][ONNX Layer: p2o.Relu.6]
Name: p2o.Conv.8 + p2o.BatchNormalization.8 + p2o.Relu.7, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_7.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x64x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x000000000002058d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.8][ONNX Layer: p2o.BatchNormalization.8][ONNX Layer: p2o.Relu.7]
Name: p2o.Conv.9 + p2o.BatchNormalization.9 + p2o.Relu.8, LayerType: CaskConvolution, Inputs: [ { Name: relu_7.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_8.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xa9a06d0633580c0c, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.9][ONNX Layer: p2o.BatchNormalization.9][ONNX Layer: p2o.Relu.8]
Name: p2o.Conv.10 + p2o.BatchNormalization.10 + p2o.Add.4 + p2o.Relu.9, LayerType: CaskConvolution, Inputs: [ { Name: relu_8.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1, TacticValue: 0x9dece0dc37e90462, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.10][ONNX Layer: p2o.BatchNormalization.10][ONNX Layer: p2o.Add.4][ONNX Layer: p2o.Relu.9]
Name: p2o.Conv.11 + p2o.BatchNormalization.11 + p2o.Relu.10, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_10.tmp_0, Location: Device, Dimensions: [6,128,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 32768}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020764, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.11][ONNX Layer: p2o.BatchNormalization.11][ONNX Layer: p2o.Relu.10]
Name: p2o.Conv.14 + p2o.BatchNormalization.14, LayerType: CaskConvolution, Inputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: batch_norm_14.tmp_2, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 131072}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r1s1, TacticValue: 0xebdd7d350fbaa00e, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.14][ONNX Layer: p2o.BatchNormalization.14]
Name: p2o.Conv.12 + p2o.BatchNormalization.12 + p2o.Relu.11, LayerType: CaskConvolution, Inputs: [ { Name: relu_10.tmp_0, Location: Device, Dimensions: [6,128,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_11.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.12][ONNX Layer: p2o.BatchNormalization.12][ONNX Layer: p2o.Relu.11]
Name: p2o.Conv.13 + p2o.BatchNormalization.13 + p2o.Add.6 + p2o.Relu.12, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_11.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: batch_norm_14.tmp_2, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000202f7, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.13][ONNX Layer: p2o.BatchNormalization.13][ONNX Layer: p2o.Add.6][ONNX Layer: p2o.Relu.12]
Name: p2o.Conv.15 + p2o.BatchNormalization.15 + p2o.Relu.13, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_13.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.15][ONNX Layer: p2o.BatchNormalization.15][ONNX Layer: p2o.Relu.13]
Name: p2o.Conv.16 + p2o.BatchNormalization.16 + p2o.Relu.14, LayerType: CaskConvolution, Inputs: [ { Name: relu_13.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_14.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.16][ONNX Layer: p2o.BatchNormalization.16][ONNX Layer: p2o.Relu.14]
Name: p2o.Conv.17 + p2o.BatchNormalization.17 + p2o.Add.8 + p2o.Relu.15, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_14.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000202f7, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.17][ONNX Layer: p2o.BatchNormalization.17][ONNX Layer: p2o.Add.8][ONNX Layer: p2o.Relu.15]
Name: p2o.Conv.18 + p2o.BatchNormalization.18 + p2o.Relu.16, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_16.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.18][ONNX Layer: p2o.BatchNormalization.18][ONNX Layer: p2o.Relu.16]
Name: p2o.Conv.19 + p2o.BatchNormalization.19 + p2o.Relu.17, LayerType: CaskConvolution, Inputs: [ { Name: relu_16.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_17.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.19][ONNX Layer: p2o.BatchNormalization.19][ONNX Layer: p2o.Relu.17]
Name: p2o.Conv.20 + p2o.BatchNormalization.20 + p2o.Add.10 + p2o.Relu.18, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_17.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000202f7, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.20][ONNX Layer: p2o.BatchNormalization.20][ONNX Layer: p2o.Add.10][ONNX Layer: p2o.Relu.18]
Name: p2o.Conv.21 + p2o.BatchNormalization.21 + p2o.Relu.19, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_19.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.21][ONNX Layer: p2o.BatchNormalization.21][ONNX Layer: p2o.Relu.19]
Name: p2o.Conv.22 + p2o.BatchNormalization.22 + p2o.Relu.20, LayerType: CaskConvolution, Inputs: [ { Name: relu_19.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_20.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.22][ONNX Layer: p2o.BatchNormalization.22][ONNX Layer: p2o.Relu.20]
Name: p2o.Conv.23 + p2o.BatchNormalization.23 + p2o.Add.12 + p2o.Relu.21, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_20.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000202f7, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.23][ONNX Layer: p2o.BatchNormalization.23][ONNX Layer: p2o.Add.12][ONNX Layer: p2o.Relu.21]
Name: p2o.Conv.24 + p2o.BatchNormalization.24 + p2o.Relu.22, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_22.tmp_0, Location: Device, Dimensions: [6,256,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 131072}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020764, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.24][ONNX Layer: p2o.BatchNormalization.24][ONNX Layer: p2o.Relu.22]
Name: p2o.Conv.27 + p2o.BatchNormalization.27, LayerType: CaskConvolution, Inputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: batch_norm_27.tmp_2, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 524288}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r1s1, TacticValue: 0x130df49cb195156b, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.27][ONNX Layer: p2o.BatchNormalization.27]
Name: p2o.Conv.25 + p2o.BatchNormalization.25 + p2o.Relu.23, LayerType: CaskConvolution, Inputs: [ { Name: relu_22.tmp_0, Location: Device, Dimensions: [6,256,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_23.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.25][ONNX Layer: p2o.BatchNormalization.25][ONNX Layer: p2o.Relu.23]
Name: p2o.Conv.26 + p2o.BatchNormalization.26 + p2o.Add.14 + p2o.Relu.24, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_23.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: batch_norm_27.tmp_2, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.26][ONNX Layer: p2o.BatchNormalization.26][ONNX Layer: p2o.Add.14][ONNX Layer: p2o.Relu.24]
Name: p2o.Conv.28 + p2o.BatchNormalization.28 + p2o.Relu.25, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_25.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.28][ONNX Layer: p2o.BatchNormalization.28][ONNX Layer: p2o.Relu.25]
Name: p2o.Conv.29 + p2o.BatchNormalization.29 + p2o.Relu.26, LayerType: CaskConvolution, Inputs: [ { Name: relu_25.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_26.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.29][ONNX Layer: p2o.BatchNormalization.29][ONNX Layer: p2o.Relu.26]
Name: p2o.Conv.30 + p2o.BatchNormalization.30 + p2o.Add.16 + p2o.Relu.27, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_26.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.30][ONNX Layer: p2o.BatchNormalization.30][ONNX Layer: p2o.Add.16][ONNX Layer: p2o.Relu.27]
Name: p2o.Conv.31 + p2o.BatchNormalization.31 + p2o.Relu.28, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_28.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.31][ONNX Layer: p2o.BatchNormalization.31][ONNX Layer: p2o.Relu.28]
Name: p2o.Conv.32 + p2o.BatchNormalization.32 + p2o.Relu.29, LayerType: CaskConvolution, Inputs: [ { Name: relu_28.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_29.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.32][ONNX Layer: p2o.BatchNormalization.32][ONNX Layer: p2o.Relu.29]
Name: p2o.Conv.33 + p2o.BatchNormalization.33 + p2o.Add.18 + p2o.Relu.30, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_29.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.33][ONNX Layer: p2o.BatchNormalization.33][ONNX Layer: p2o.Add.18][ONNX Layer: p2o.Relu.30]
Name: p2o.Conv.34 + p2o.BatchNormalization.34 + p2o.Relu.31, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_31.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.34][ONNX Layer: p2o.BatchNormalization.34][ONNX Layer: p2o.Relu.31]
Name: p2o.Conv.35 + p2o.BatchNormalization.35 + p2o.Relu.32, LayerType: CaskConvolution, Inputs: [ { Name: relu_31.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_32.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.35][ONNX Layer: p2o.BatchNormalization.35][ONNX Layer: p2o.Relu.32]
Name: p2o.Conv.36 + p2o.BatchNormalization.36 + p2o.Add.20 + p2o.Relu.33, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_32.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.36][ONNX Layer: p2o.BatchNormalization.36][ONNX Layer: p2o.Add.20][ONNX Layer: p2o.Relu.33]
Name: p2o.Conv.37 + p2o.BatchNormalization.37 + p2o.Relu.34, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_34.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.37][ONNX Layer: p2o.BatchNormalization.37][ONNX Layer: p2o.Relu.34]
Name: p2o.Conv.38 + p2o.BatchNormalization.38 + p2o.Relu.35, LayerType: CaskConvolution, Inputs: [ { Name: relu_34.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_35.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.38][ONNX Layer: p2o.BatchNormalization.38][ONNX Layer: p2o.Relu.35]
Name: p2o.Conv.39 + p2o.BatchNormalization.39 + p2o.Add.22 + p2o.Relu.36, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_35.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.39][ONNX Layer: p2o.BatchNormalization.39][ONNX Layer: p2o.Add.22][ONNX Layer: p2o.Relu.36]
Name: p2o.Conv.40 + p2o.BatchNormalization.40 + p2o.Relu.37, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_37.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.40][ONNX Layer: p2o.BatchNormalization.40][ONNX Layer: p2o.Relu.37]
Name: p2o.Conv.41 + p2o.BatchNormalization.41 + p2o.Relu.38, LayerType: CaskConvolution, Inputs: [ { Name: relu_37.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_38.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.41][ONNX Layer: p2o.BatchNormalization.41][ONNX Layer: p2o.Relu.38]
Name: p2o.Conv.42 + p2o.BatchNormalization.42 + p2o.Add.24 + p2o.Relu.39, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_38.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.42][ONNX Layer: p2o.BatchNormalization.42][ONNX Layer: p2o.Add.24][ONNX Layer: p2o.Relu.39]
Name: p2o.Conv.43 + p2o.BatchNormalization.43 + p2o.Relu.40, LayerType: CaskConvolution, Inputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_40.tmp_0, Location: Device, Dimensions: [6,512,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 524288}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8, TacticValue: 0x25b2b9d5c9d5ca0d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.43][ONNX Layer: p2o.BatchNormalization.43][ONNX Layer: p2o.Relu.40]
Name: p2o.Conv.46 + p2o.BatchNormalization.46, LayerType: CaskConvolution, Inputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: batch_norm_46.tmp_2, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Float", "Count": 2097152}, Bias: {"Type": "Float", "Count": 2048}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8, TacticValue: 0x25b2b9d5c9d5ca0d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.46][ONNX Layer: p2o.BatchNormalization.46]
Name: p2o.Conv.44 + p2o.BatchNormalization.44 + p2o.Relu.41, LayerType: CaskConvolution, Inputs: [ { Name: relu_40.tmp_0, Location: Device, Dimensions: [6,512,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_41.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 2359296}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.44][ONNX Layer: p2o.BatchNormalization.44][ONNX Layer: p2o.Relu.41]
Name: p2o.Conv.45 + p2o.BatchNormalization.45 + p2o.Add.26 + p2o.Relu.42, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_41.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: batch_norm_46.tmp_2, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Float", "Count": 1048576}, Bias: {"Type": "Float", "Count": 2048}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020413, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.45][ONNX Layer: p2o.BatchNormalization.45][ONNX Layer: p2o.Add.26][ONNX Layer: p2o.Relu.42]
Name: p2o.Conv.47 + p2o.BatchNormalization.47 + p2o.Relu.43, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_43.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 1048576}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000040601d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.47][ONNX Layer: p2o.BatchNormalization.47][ONNX Layer: p2o.Relu.43]
Name: p2o.Conv.48 + p2o.BatchNormalization.48 + p2o.Relu.44, LayerType: CaskConvolution, Inputs: [ { Name: relu_43.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_44.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 2359296}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.48][ONNX Layer: p2o.BatchNormalization.48][ONNX Layer: p2o.Relu.44]
Name: p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_44.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: Reformatted Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Float", "Count": 1048576}, Bias: {"Type": "Float", "Count": 2048}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020413, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.49][ONNX Layer: p2o.BatchNormalization.49][ONNX Layer: p2o.Add.28]
Name: Reformatting CopyNode for Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, LayerType: Reformat, Inputs: [ { Name: Reformatted Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: tmp_16, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Row major linear FP32 }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x0000000000000000, StreamId: 0, Metadata:
Bindings:
stack_0.tmp_0
tmp_16
[01/09/2024-07:11:19] [I] Starting inference
[01/09/2024-07:11:22] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[01/09/2024-07:11:22] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[01/09/2024-07:11:22] [I]
[01/09/2024-07:11:22] [I] === Profile (379 iterations ) ===
[01/09/2024-07:11:22] [I] Time(ms) Avg.(ms) Median(ms) Time(%) Layer
[01/09/2024-07:11:22] [I] 29.82 0.0787 0.0594 1.2 p2o.Transpose.0
[01/09/2024-07:11:22] [I] 14.51 0.0383 0.0379 0.6 PWN(PWN(p2o.Sub.0), PWN(p2o.Div.0))
[01/09/2024-07:11:22] [I] 33.62 0.0887 0.0532 1.4 Reformatting CopyNode for Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0
[01/09/2024-07:11:22] [I] 145.53 0.3840 0.3768 6.0 p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0
[01/09/2024-07:11:22] [I] 51.87 0.1369 0.1362 2.2 p2o.MaxPool.0
[01/09/2024-07:11:22] [I] 22.14 0.0584 0.0573 0.9 p2o.Conv.1 + p2o.BatchNormalization.1 + p2o.Relu.1
[01/09/2024-07:11:22] [I] 52.72 0.1391 0.1372 2.2 p2o.Conv.2 + p2o.BatchNormalization.2 + p2o.Relu.2
[01/09/2024-07:11:22] [I] 52.90 0.1396 0.1382 2.2 p2o.Conv.3 + p2o.BatchNormalization.3
[01/09/2024-07:11:22] [I] 93.40 0.2464 0.2458 3.9 p2o.Conv.4 + p2o.BatchNormalization.4 + p2o.Add.0 + p2o.Relu.3
[01/09/2024-07:11:22] [I] 53.38 0.1409 0.1403 2.2 p2o.Conv.5 + p2o.BatchNormalization.5 + p2o.Relu.4
[01/09/2024-07:11:22] [I] 50.95 0.1344 0.1321 2.1 p2o.Conv.6 + p2o.BatchNormalization.6 + p2o.Relu.5
[01/09/2024-07:11:22] [I] 92.66 0.2445 0.2437 3.8 p2o.Conv.7 + p2o.BatchNormalization.7 + p2o.Add.2 + p2o.Relu.6
[01/09/2024-07:11:22] [I] 53.67 0.1416 0.1413 2.2 p2o.Conv.8 + p2o.BatchNormalization.8 + p2o.Relu.7
[01/09/2024-07:11:22] [I] 51.75 0.1366 0.1341 2.1 p2o.Conv.9 + p2o.BatchNormalization.9 + p2o.Relu.8
[01/09/2024-07:11:22] [I] 92.73 0.2447 0.2437 3.8 p2o.Conv.10 + p2o.BatchNormalization.10 + p2o.Add.4 + p2o.Relu.9
[01/09/2024-07:11:22] [I] 62.96 0.1661 0.1659 2.6 p2o.Conv.11 + p2o.BatchNormalization.11 + p2o.Relu.10
[01/09/2024-07:11:22] [I] 46.53 0.1228 0.1208 1.9 p2o.Conv.14 + p2o.BatchNormalization.14
[01/09/2024-07:11:22] [I] 49.00 0.1293 0.1280 2.0 p2o.Conv.12 + p2o.BatchNormalization.12 + p2o.Relu.11
[01/09/2024-07:11:22] [I] 46.42 0.1225 0.1219 1.9 p2o.Conv.13 + p2o.BatchNormalization.13 + p2o.Add.6 + p2o.Relu.12
[01/09/2024-07:11:22] [I] 33.97 0.0896 0.0891 1.4 p2o.Conv.15 + p2o.BatchNormalization.15 + p2o.Relu.13
[01/09/2024-07:11:22] [I] 43.85 0.1157 0.1137 1.8 p2o.Conv.16 + p2o.BatchNormalization.16 + p2o.Relu.14
[01/09/2024-07:11:22] [I] 46.76 0.1234 0.1229 1.9 p2o.Conv.17 + p2o.BatchNormalization.17 + p2o.Add.8 + p2o.Relu.15
[01/09/2024-07:11:22] [I] 33.86 0.0893 0.0891 1.4 p2o.Conv.18 + p2o.BatchNormalization.18 + p2o.Relu.16
[01/09/2024-07:11:22] [I] 43.69 0.1153 0.1137 1.8 p2o.Conv.19 + p2o.BatchNormalization.19 + p2o.Relu.17
[01/09/2024-07:11:22] [I] 47.06 0.1242 0.1239 2.0 p2o.Conv.20 + p2o.BatchNormalization.20 + p2o.Add.10 + p2o.Relu.18
[01/09/2024-07:11:22] [I] 33.97 0.0896 0.0891 1.4 p2o.Conv.21 + p2o.BatchNormalization.21 + p2o.Relu.19
[01/09/2024-07:11:22] [I] 43.81 0.1156 0.1137 1.8 p2o.Conv.22 + p2o.BatchNormalization.22 + p2o.Relu.20
[01/09/2024-07:11:22] [I] 46.72 0.1233 0.1229 1.9 p2o.Conv.23 + p2o.BatchNormalization.23 + p2o.Add.12 + p2o.Relu.21
[01/09/2024-07:11:22] [I] 42.52 0.1122 0.1106 1.8 p2o.Conv.24 + p2o.BatchNormalization.24 + p2o.Relu.22
[01/09/2024-07:11:22] [I] 39.25 0.1036 0.1024 1.6 p2o.Conv.27 + p2o.BatchNormalization.27
[01/09/2024-07:11:22] [I] 48.27 0.1274 0.1249 2.0 p2o.Conv.25 + p2o.BatchNormalization.25 + p2o.Relu.23
[01/09/2024-07:11:22] [I] 30.11 0.0794 0.0788 1.2 p2o.Conv.26 + p2o.BatchNormalization.26 + p2o.Add.14 + p2o.Relu.24
[01/09/2024-07:11:22] [I] 24.57 0.0648 0.0635 1.0 p2o.Conv.28 + p2o.BatchNormalization.28 + p2o.Relu.25
[01/09/2024-07:11:22] [I] 48.16 0.1271 0.1249 2.0 p2o.Conv.29 + p2o.BatchNormalization.29 + p2o.Relu.26
[01/09/2024-07:11:22] [I] 30.39 0.0802 0.0788 1.3 p2o.Conv.30 + p2o.BatchNormalization.30 + p2o.Add.16 + p2o.Relu.27
[01/09/2024-07:11:22] [I] 24.61 0.0649 0.0635 1.0 p2o.Conv.31 + p2o.BatchNormalization.31 + p2o.Relu.28
[01/09/2024-07:11:22] [I] 48.12 0.1270 0.1249 2.0 p2o.Conv.32 + p2o.BatchNormalization.32 + p2o.Relu.29
[01/09/2024-07:11:22] [I] 30.40 0.0802 0.0799 1.3 p2o.Conv.33 + p2o.BatchNormalization.33 + p2o.Add.18 + p2o.Relu.30
[01/09/2024-07:11:22] [I] 24.59 0.0649 0.0635 1.0 p2o.Conv.34 + p2o.BatchNormalization.34 + p2o.Relu.31
[01/09/2024-07:11:22] [I] 48.12 0.1270 0.1249 2.0 p2o.Conv.35 + p2o.BatchNormalization.35 + p2o.Relu.32
[01/09/2024-07:11:22] [I] 30.34 0.0800 0.0788 1.3 p2o.Conv.36 + p2o.BatchNormalization.36 + p2o.Add.20 + p2o.Relu.33
[01/09/2024-07:11:22] [I] 24.57 0.0648 0.0635 1.0 p2o.Conv.37 + p2o.BatchNormalization.37 + p2o.Relu.34
[01/09/2024-07:11:22] [I] 48.11 0.1270 0.1249 2.0 p2o.Conv.38 + p2o.BatchNormalization.38 + p2o.Relu.35
[01/09/2024-07:11:22] [I] 30.51 0.0805 0.0799 1.3 p2o.Conv.39 + p2o.BatchNormalization.39 + p2o.Add.22 + p2o.Relu.36
[01/09/2024-07:11:22] [I] 24.56 0.0648 0.0635 1.0 p2o.Conv.40 + p2o.BatchNormalization.40 + p2o.Relu.37
[01/09/2024-07:11:22] [I] 48.12 0.1270 0.1249 2.0 p2o.Conv.41 + p2o.BatchNormalization.41 + p2o.Relu.38
[01/09/2024-07:11:22] [I] 30.36 0.0801 0.0788 1.3 p2o.Conv.42 + p2o.BatchNormalization.42 + p2o.Add.24 + p2o.Relu.39
[01/09/2024-07:11:22] [I] 40.96 0.1081 0.1065 1.7 p2o.Conv.43 + p2o.BatchNormalization.43 + p2o.Relu.40
[01/09/2024-07:11:22] [I] 40.40 0.1066 0.1044 1.7 p2o.Conv.46 + p2o.BatchNormalization.46
[01/09/2024-07:11:22] [I] 48.99 0.1292 0.1270 2.0 p2o.Conv.44 + p2o.BatchNormalization.44 + p2o.Relu.41
[01/09/2024-07:11:22] [I] 25.46 0.0672 0.0666 1.1 p2o.Conv.45 + p2o.BatchNormalization.45 + p2o.Add.26 + p2o.Relu.42
[01/09/2024-07:11:22] [I] 23.85 0.0629 0.0614 1.0 p2o.Conv.47 + p2o.BatchNormalization.47 + p2o.Relu.43
[01/09/2024-07:11:22] [I] 48.82 0.1288 0.1260 2.0 p2o.Conv.48 + p2o.BatchNormalization.48 + p2o.Relu.44
[01/09/2024-07:11:22] [I] 25.76 0.0680 0.0666 1.1 p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28
[01/09/2024-07:11:22] [I] 8.52 0.0225 0.0215 0.4 Reformatting CopyNode for Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28
[01/09/2024-07:11:22] [I] 2408.69 6.3554 6.2280 100.0 Total
FP16
[01/09/2024-07:11:05] [I] Layers:
Name: assign_0.tmp_0, LayerType: Constant, Inputs: [], Outputs: [ { Name: (Unnamed Layer* 1) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Constant, weights: {"Type": "Float", "Count": 3}, dimensions: [1,3,1,1], TacticValue: 0x0000000000000000, StreamId: 0, Metadata:
Name: assign_1.tmp_0, LayerType: Constant, Inputs: [], Outputs: [ { Name: (Unnamed Layer* 3) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Constant, weights: {"Type": "Float", "Count": 3}, dimensions: [1,3,1,1], TacticValue: 0x0000000000000000, StreamId: 0, Metadata:
Name: p2o.Transpose.0, LayerType: Shuffle, Inputs: [ { Name: stack_0.tmp_0, Location: Device, Dimensions: [6,465,720,3], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: transpose_0.tmp_0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], ParameterType: Shuffle, FirstTranspose: [0,3,1,2], Reshape: "nbDims=-1", SecondTranspose: [0,1,2,3], ZeroIsPlaceholder: 1, TacticValue: 0x0000000000000000, StreamId: 0, Metadata: [ONNX Layer: p2o.Transpose.0]
Name: PWN(PWN(p2o.Sub.0), PWN(p2o.Div.0)), LayerType: PointWiseV2, Inputs: [ { Name: transpose_0.tmp_0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }, { Name: (Unnamed Layer* 1) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }, { Name: (Unnamed Layer* 3) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: p2o.Div.1, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], ParameterType: PointWise, ParameterSubType: PointWiseExpression, NbInputArgs: 3, InputArgs: ["arg0", "arg1", "arg2"], NbOutputVars: 1, OutputVars: ["var1"], NbParams: 0, Params: [], NbLiterals: 0, Literals: [], NbOperations: 2, Operations: ["auto const var0 = pwgen::iMinus(arg0, arg1);", "auto const var1 = pwgen::iDiv(var0, arg2);"], TacticValue: 0x0000000000000009, StreamId: 0, Metadata: [ONNX Layer: p2o.Sub.0][ONNX Layer: p2o.Div.0]
Name: Reformatting CopyNode for Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, LayerType: Reformat, Inputs: [ { Name: p2o.Div.1, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: Reformatted Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Channel major FP16 format where channel % 4 == 0 }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x00000000000003ea, StreamId: 0, Metadata:
Name: p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, LayerType: CaskConvolution, Inputs: [ { Name: Reformatted Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Channel major FP16 format where channel % 4 == 0 }], Outputs: [ { Name: relu_0.tmp_0, Location: Device, Dimensions: [6,64,240,368], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [7,7], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [3,3], PostPadding: [18,19], Stride: [2,2], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 9408}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_image_first_layer_f16f16_f32_f16_nhwckrsc_nhwc_hmma_k64c4r7s7_stride2x2_tile16x64x64_tensor1688, TacticValue: 0x4341b9cbb7197a9b, StreamId: 0, Metadata: [ONNX Layer: p2o.Pad.0][ONNX Layer: p2o.Conv.0][ONNX Layer: p2o.BatchNormalization.0][ONNX Layer: p2o.Relu.0]
Name: p2o.MaxPool.0, LayerType: CaskPooling, Inputs: [ { Name: relu_0.tmp_0, Location: Device, Dimensions: [6,64,240,368], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Pooling, PoolingType: MAX, WindowSize: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], BlendFactor: 0, AverageCountExcludesPadding: 1, TacticName: sm50_xmma_pooling_coalescedC_NHWC_kMAX_3_False, TacticValue: 0xdb415cba6b0e9137, StreamId: 0, Metadata: [ONNX Layer: p2o.MaxPool.0]
Name: p2o.Conv.1 + p2o.BatchNormalization.1 + p2o.Relu.1, LayerType: CaskGemmConvolution, Inputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_1.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 4096}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x64x64_stage3_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020164, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.1][ONNX Layer: p2o.BatchNormalization.1][ONNX Layer: p2o.Relu.1]
Name: p2o.Conv.2 + p2o.BatchNormalization.2 + p2o.Relu.2, LayerType: CaskConvolution, Inputs: [ { Name: relu_1.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_2.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 36864}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x1x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x529f4431bdae94f5, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.2][ONNX Layer: p2o.BatchNormalization.2][ONNX Layer: p2o.Relu.2]
Name: p2o.Conv.3 + p2o.BatchNormalization.3, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_2.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: batch_norm_3.tmp_2, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.3][ONNX Layer: p2o.BatchNormalization.3]
Name: p2o.Conv.4 + p2o.BatchNormalization.4 + p2o.Add.0 + p2o.Relu.3, LayerType: CaskGemmConvolution, Inputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: batch_norm_3.tmp_2, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.4][ONNX Layer: p2o.BatchNormalization.4][ONNX Layer: p2o.Add.0][ONNX Layer: p2o.Relu.3]
Name: p2o.Conv.5 + p2o.BatchNormalization.5 + p2o.Relu.4, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_4.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.5][ONNX Layer: p2o.BatchNormalization.5][ONNX Layer: p2o.Relu.4]
Name: p2o.Conv.6 + p2o.BatchNormalization.6 + p2o.Relu.5, LayerType: CaskConvolution, Inputs: [ { Name: relu_4.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_5.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 36864}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x1x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x529f4431bdae94f5, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.6][ONNX Layer: p2o.BatchNormalization.6][ONNX Layer: p2o.Relu.5]
Name: p2o.Conv.7 + p2o.BatchNormalization.7 + p2o.Add.2 + p2o.Relu.6, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_5.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.7][ONNX Layer: p2o.BatchNormalization.7][ONNX Layer: p2o.Add.2][ONNX Layer: p2o.Relu.6]
Name: p2o.Conv.8 + p2o.BatchNormalization.8 + p2o.Relu.7, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_7.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.8][ONNX Layer: p2o.BatchNormalization.8][ONNX Layer: p2o.Relu.7]
Name: p2o.Conv.9 + p2o.BatchNormalization.9 + p2o.Relu.8, LayerType: CaskConvolution, Inputs: [ { Name: relu_7.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_8.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 36864}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x1x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x529f4431bdae94f5, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.9][ONNX Layer: p2o.BatchNormalization.9][ONNX Layer: p2o.Relu.8]
Name: p2o.Conv.10 + p2o.BatchNormalization.10 + p2o.Add.4 + p2o.Relu.9, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_8.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.10][ONNX Layer: p2o.BatchNormalization.10][ONNX Layer: p2o.Add.4][ONNX Layer: p2o.Relu.9]
Name: p2o.Conv.11 + p2o.BatchNormalization.11 + p2o.Relu.10, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_10.tmp_0, Location: Device, Dimensions: [6,128,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 32768}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize64x128x32_stage5_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020435, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.11][ONNX Layer: p2o.BatchNormalization.11][ONNX Layer: p2o.Relu.10]
Name: p2o.Conv.14 + p2o.BatchNormalization.14, LayerType: CaskConvolution, Inputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: batch_norm_14.tmp_2, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 131072}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r1s1, TacticValue: 0xea50b6d3d87bf5dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.14][ONNX Layer: p2o.BatchNormalization.14]
Name: p2o.Conv.12 + p2o.BatchNormalization.12 + p2o.Relu.11, LayerType: CaskConvolution, Inputs: [ { Name: relu_10.tmp_0, Location: Device, Dimensions: [6,128,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_11.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 147456}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16, TacticValue: 0xdfa020ef435ef810, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.12][ONNX Layer: p2o.BatchNormalization.12][ONNX Layer: p2o.Relu.11]
Name: p2o.Conv.13 + p2o.BatchNormalization.13 + p2o.Add.6 + p2o.Relu.12, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_11.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: batch_norm_14.tmp_2, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x00000000000207fa, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.13][ONNX Layer: p2o.BatchNormalization.13][ONNX Layer: p2o.Add.6][ONNX Layer: p2o.Relu.12]
Name: p2o.Conv.15 + p2o.BatchNormalization.15 + p2o.Relu.13, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_13.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_tn_v1, TacticValue: 0x0000000000020848, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.15][ONNX Layer: p2o.BatchNormalization.15][ONNX Layer: p2o.Relu.13]
Name: p2o.Conv.16 + p2o.BatchNormalization.16 + p2o.Relu.14, LayerType: CaskConvolution, Inputs: [ { Name: relu_13.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_14.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 147456}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x60c3421152ef8e10, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.16][ONNX Layer: p2o.BatchNormalization.16][ONNX Layer: p2o.Relu.14]
Name: p2o.Conv.17 + p2o.BatchNormalization.17 + p2o.Add.8 + p2o.Relu.15, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_14.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x00000000000207fa, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.17][ONNX Layer: p2o.BatchNormalization.17][ONNX Layer: p2o.Add.8][ONNX Layer: p2o.Relu.15]
Name: p2o.Conv.18 + p2o.BatchNormalization.18 + p2o.Relu.16, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_16.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_tn_v1, TacticValue: 0x0000000000020848, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.18][ONNX Layer: p2o.BatchNormalization.18][ONNX Layer: p2o.Relu.16]
Name: p2o.Conv.19 + p2o.BatchNormalization.19 + p2o.Relu.17, LayerType: CaskConvolution, Inputs: [ { Name: relu_16.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_17.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 147456}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x60c3421152ef8e10, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.19][ONNX Layer: p2o.BatchNormalization.19][ONNX Layer: p2o.Relu.17]
Name: p2o.Conv.20 + p2o.BatchNormalization.20 + p2o.Add.10 + p2o.Relu.18, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_17.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x00000000000207fa, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.20][ONNX Layer: p2o.BatchNormalization.20][ONNX Layer: p2o.Add.10][ONNX Layer: p2o.Relu.18]
Name: p2o.Conv.21 + p2o.BatchNormalization.21 + p2o.Relu.19, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_19.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_tn_v1, TacticValue: 0x0000000000020848, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.21][ONNX Layer: p2o.BatchNormalization.21][ONNX Layer: p2o.Relu.19]
Name: p2o.Conv.22 + p2o.BatchNormalization.22 + p2o.Relu.20, LayerType: CaskConvolution, Inputs: [ { Name: relu_19.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_20.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 147456}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x60c3421152ef8e10, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.22][ONNX Layer: p2o.BatchNormalization.22][ONNX Layer: p2o.Relu.20]
Name: p2o.Conv.23 + p2o.BatchNormalization.23 + p2o.Add.12 + p2o.Relu.21, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_20.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x00000000000207fa, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.23][ONNX Layer: p2o.BatchNormalization.23][ONNX Layer: p2o.Add.12][ONNX Layer: p2o.Relu.21]
Name: p2o.Conv.24 + p2o.BatchNormalization.24 + p2o.Relu.22, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_22.tmp_0, Location: Device, Dimensions: [6,256,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 131072}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.24][ONNX Layer: p2o.BatchNormalization.24][ONNX Layer: p2o.Relu.22]
Name: p2o.Conv.27 + p2o.BatchNormalization.27, LayerType: CaskConvolution, Inputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: batch_norm_27.tmp_2, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 524288}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r1s1, TacticValue: 0xea50b6d3d87bf5dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.27][ONNX Layer: p2o.BatchNormalization.27]
Name: p2o.Conv.25 + p2o.BatchNormalization.25 + p2o.Relu.23, LayerType: CaskConvolution, Inputs: [ { Name: relu_22.tmp_0, Location: Device, Dimensions: [6,256,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_23.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x64x32_stage5_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0xb4bec086187edcfc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.25][ONNX Layer: p2o.BatchNormalization.25][ONNX Layer: p2o.Relu.23]
Name: p2o.Conv.26 + p2o.BatchNormalization.26 + p2o.Add.14 + p2o.Relu.24, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_23.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: batch_norm_27.tmp_2, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.26][ONNX Layer: p2o.BatchNormalization.26][ONNX Layer: p2o.Add.14][ONNX Layer: p2o.Relu.24]
Name: p2o.Conv.28 + p2o.BatchNormalization.28 + p2o.Relu.25, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_25.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.28][ONNX Layer: p2o.BatchNormalization.28][ONNX Layer: p2o.Relu.25]
Name: p2o.Conv.29 + p2o.BatchNormalization.29 + p2o.Relu.26, LayerType: CaskConvolution, Inputs: [ { Name: relu_25.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_26.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.29][ONNX Layer: p2o.BatchNormalization.29][ONNX Layer: p2o.Relu.26]
Name: p2o.Conv.30 + p2o.BatchNormalization.30 + p2o.Add.16 + p2o.Relu.27, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_26.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.30][ONNX Layer: p2o.BatchNormalization.30][ONNX Layer: p2o.Add.16][ONNX Layer: p2o.Relu.27]
Name: p2o.Conv.31 + p2o.BatchNormalization.31 + p2o.Relu.28, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_28.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.31][ONNX Layer: p2o.BatchNormalization.31][ONNX Layer: p2o.Relu.28]
Name: p2o.Conv.32 + p2o.BatchNormalization.32 + p2o.Relu.29, LayerType: CaskConvolution, Inputs: [ { Name: relu_28.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_29.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.32][ONNX Layer: p2o.BatchNormalization.32][ONNX Layer: p2o.Relu.29]
Name: p2o.Conv.33 + p2o.BatchNormalization.33 + p2o.Add.18 + p2o.Relu.30, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_29.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.33][ONNX Layer: p2o.BatchNormalization.33][ONNX Layer: p2o.Add.18][ONNX Layer: p2o.Relu.30]
Name: p2o.Conv.34 + p2o.BatchNormalization.34 + p2o.Relu.31, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_31.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.34][ONNX Layer: p2o.BatchNormalization.34][ONNX Layer: p2o.Relu.31]
Name: p2o.Conv.35 + p2o.BatchNormalization.35 + p2o.Relu.32, LayerType: CaskConvolution, Inputs: [ { Name: relu_31.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_32.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.35][ONNX Layer: p2o.BatchNormalization.35][ONNX Layer: p2o.Relu.32]
Name: p2o.Conv.36 + p2o.BatchNormalization.36 + p2o.Add.20 + p2o.Relu.33, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_32.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.36][ONNX Layer: p2o.BatchNormalization.36][ONNX Layer: p2o.Add.20][ONNX Layer: p2o.Relu.33]
Name: p2o.Conv.37 + p2o.BatchNormalization.37 + p2o.Relu.34, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_34.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.37][ONNX Layer: p2o.BatchNormalization.37][ONNX Layer: p2o.Relu.34]
Name: p2o.Conv.38 + p2o.BatchNormalization.38 + p2o.Relu.35, LayerType: CaskConvolution, Inputs: [ { Name: relu_34.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_35.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.38][ONNX Layer: p2o.BatchNormalization.38][ONNX Layer: p2o.Relu.35]
Name: p2o.Conv.39 + p2o.BatchNormalization.39 + p2o.Add.22 + p2o.Relu.36, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_35.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.39][ONNX Layer: p2o.BatchNormalization.39][ONNX Layer: p2o.Add.22][ONNX Layer: p2o.Relu.36]
Name: p2o.Conv.40 + p2o.BatchNormalization.40 + p2o.Relu.37, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_37.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.40][ONNX Layer: p2o.BatchNormalization.40][ONNX Layer: p2o.Relu.37]
Name: p2o.Conv.41 + p2o.BatchNormalization.41 + p2o.Relu.38, LayerType: CaskConvolution, Inputs: [ { Name: relu_37.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_38.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.41][ONNX Layer: p2o.BatchNormalization.41][ONNX Layer: p2o.Relu.38]
Name: p2o.Conv.42 + p2o.BatchNormalization.42 + p2o.Add.24 + p2o.Relu.39, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_38.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.42][ONNX Layer: p2o.BatchNormalization.42][ONNX Layer: p2o.Add.24][ONNX Layer: p2o.Relu.39]
Name: p2o.Conv.43 + p2o.BatchNormalization.43 + p2o.Relu.40, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_40.tmp_0, Location: Device, Dimensions: [6,512,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 524288}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.43][ONNX Layer: p2o.BatchNormalization.43][ONNX Layer: p2o.Relu.40]
Name: p2o.Conv.46 + p2o.BatchNormalization.46, LayerType: CaskConvolution, Inputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: batch_norm_46.tmp_2, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Half", "Count": 2097152}, Bias: {"Type": "Half", "Count": 2048}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16, TacticValue: 0xdfa020ef435ef810, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.46][ONNX Layer: p2o.BatchNormalization.46]
Name: p2o.Conv.44 + p2o.BatchNormalization.44 + p2o.Relu.41, LayerType: CaskConvolution, Inputs: [ { Name: relu_40.tmp_0, Location: Device, Dimensions: [6,512,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_41.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 2359296}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x60c3421152ef8e10, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.44][ONNX Layer: p2o.BatchNormalization.44][ONNX Layer: p2o.Relu.41]
Name: p2o.Conv.45 + p2o.BatchNormalization.45 + p2o.Add.26 + p2o.Relu.42, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_41.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: batch_norm_46.tmp_2, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Half", "Count": 1048576}, Bias: {"Type": "Half", "Count": 2048}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_nn_v1, TacticValue: 0x00000000000208da, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.45][ONNX Layer: p2o.BatchNormalization.45][ONNX Layer: p2o.Add.26][ONNX Layer: p2o.Relu.42]
Name: p2o.Conv.47 + p2o.BatchNormalization.47 + p2o.Relu.43, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_43.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 1048576}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.47][ONNX Layer: p2o.BatchNormalization.47][ONNX Layer: p2o.Relu.43]
Name: p2o.Conv.48 + p2o.BatchNormalization.48 + p2o.Relu.44, LayerType: CaskConvolution, Inputs: [ { Name: relu_43.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_44.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 2359296}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.48][ONNX Layer: p2o.BatchNormalization.48][ONNX Layer: p2o.Relu.44]
Name: p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_44.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: Reformatted Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Half", "Count": 1048576}, Bias: {"Type": "Half", "Count": 2048}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_nn_v1, TacticValue: 0x00000000000208da, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.49][ONNX Layer: p2o.BatchNormalization.49][ONNX Layer: p2o.Add.28]
Name: Reformatting CopyNode for Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, LayerType: Reformat, Inputs: [ { Name: Reformatted Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: tmp_16, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Row major linear FP32 }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x0000000000000000, StreamId: 0, Metadata:
Bindings:
stack_0.tmp_0
tmp_16
[01/09/2024-07:11:05] [I] Starting inference
[01/09/2024-07:11:08] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[01/09/2024-07:11:08] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[01/09/2024-07:11:08] [I]
[01/09/2024-07:11:08] [I] === Profile (576 iterations ) ===
[01/09/2024-07:11:08] [I] Time(ms) Avg.(ms) Median(ms) Time(%) Layer
[01/09/2024-07:11:08] [I] 41.69 0.0724 0.0594 2.1 p2o.Transpose.0
[01/09/2024-07:11:08] [I] 21.83 0.0379 0.0379 1.1 PWN(PWN(p2o.Sub.0), PWN(p2o.Div.0))
[01/09/2024-07:11:08] [I] 25.89 0.0449 0.0410 1.3 Reformatting CopyNode for Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0
[01/09/2024-07:11:08] [I] 71.08 0.1234 0.1208 3.6 p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0
[01/09/2024-07:11:08] [I] 46.31 0.0804 0.0799 2.3 p2o.MaxPool.0
[01/09/2024-07:11:08] [I] 17.39 0.0302 0.0297 0.9 p2o.Conv.1 + p2o.BatchNormalization.1 + p2o.Relu.1
[01/09/2024-07:11:08] [I] 40.48 0.0703 0.0686 2.0 p2o.Conv.2 + p2o.BatchNormalization.2 + p2o.Relu.2
[01/09/2024-07:11:08] [I] 41.80 0.0726 0.0717 2.1 p2o.Conv.3 + p2o.BatchNormalization.3
[01/09/2024-07:11:08] [I] 73.15 0.1270 0.1260 3.7 p2o.Conv.4 + p2o.BatchNormalization.4 + p2o.Add.0 + p2o.Relu.3
[01/09/2024-07:11:08] [I] 44.81 0.0778 0.0768 2.3 p2o.Conv.5 + p2o.BatchNormalization.5 + p2o.Relu.4
[01/09/2024-07:11:08] [I] 43.14 0.0749 0.0737 2.2 p2o.Conv.6 + p2o.BatchNormalization.6 + p2o.Relu.5
[01/09/2024-07:11:08] [I] 71.59 0.1243 0.1239 3.6 p2o.Conv.7 + p2o.BatchNormalization.7 + p2o.Add.2 + p2o.Relu.6
[01/09/2024-07:11:08] [I] 44.82 0.0778 0.0778 2.3 p2o.Conv.8 + p2o.BatchNormalization.8 + p2o.Relu.7
[01/09/2024-07:11:08] [I] 42.68 0.0741 0.0727 2.2 p2o.Conv.9 + p2o.BatchNormalization.9 + p2o.Relu.8
[01/09/2024-07:11:08] [I] 71.35 0.1239 0.1229 3.6 p2o.Conv.10 + p2o.BatchNormalization.10 + p2o.Add.4 + p2o.Relu.9
[01/09/2024-07:11:08] [I] 54.45 0.0945 0.0922 2.7 p2o.Conv.11 + p2o.BatchNormalization.11 + p2o.Relu.10
[01/09/2024-07:11:08] [I] 36.97 0.0642 0.0635 1.9 p2o.Conv.14 + p2o.BatchNormalization.14
[01/09/2024-07:11:08] [I] 42.44 0.0737 0.0727 2.1 p2o.Conv.12 + p2o.BatchNormalization.12 + p2o.Relu.11
[01/09/2024-07:11:08] [I] 38.67 0.0671 0.0666 1.9 p2o.Conv.13 + p2o.BatchNormalization.13 + p2o.Add.6 + p2o.Relu.12
[01/09/2024-07:11:08] [I] 128.37 0.2229 0.0440 6.5 p2o.Conv.15 + p2o.BatchNormalization.15 + p2o.Relu.13
[01/09/2024-07:11:08] [I] 33.04 0.0574 0.0563 1.7 p2o.Conv.16 + p2o.BatchNormalization.16 + p2o.Relu.14
[01/09/2024-07:11:08] [I] 38.14 0.0662 0.0655 1.9 p2o.Conv.17 + p2o.BatchNormalization.17 + p2o.Add.8 + p2o.Relu.15
[01/09/2024-07:11:08] [I] 25.62 0.0445 0.0440 1.3 p2o.Conv.18 + p2o.BatchNormalization.18 + p2o.Relu.16
[01/09/2024-07:11:08] [I] 32.82 0.0570 0.0553 1.7 p2o.Conv.19 + p2o.BatchNormalization.19 + p2o.Relu.17
[01/09/2024-07:11:08] [I] 38.20 0.0663 0.0655 1.9 p2o.Conv.20 + p2o.BatchNormalization.20 + p2o.Add.10 + p2o.Relu.18
[01/09/2024-07:11:08] [I] 25.64 0.0445 0.0440 1.3 p2o.Conv.21 + p2o.BatchNormalization.21 + p2o.Relu.19
[01/09/2024-07:11:08] [I] 32.70 0.0568 0.0553 1.6 p2o.Conv.22 + p2o.BatchNormalization.22 + p2o.Relu.20
[01/09/2024-07:11:08] [I] 38.02 0.0660 0.0655 1.9 p2o.Conv.23 + p2o.BatchNormalization.23 + p2o.Add.12 + p2o.Relu.21
[01/09/2024-07:11:08] [I] 34.20 0.0594 0.0584 1.7 p2o.Conv.24 + p2o.BatchNormalization.24 + p2o.Relu.22
[01/09/2024-07:11:08] [I] 29.41 0.0511 0.0502 1.5 p2o.Conv.27 + p2o.BatchNormalization.27
[01/09/2024-07:11:08] [I] 41.25 0.0716 0.0707 2.1 p2o.Conv.25 + p2o.BatchNormalization.25 + p2o.Relu.23
[01/09/2024-07:11:08] [I] 23.62 0.0410 0.0399 1.2 p2o.Conv.26 + p2o.BatchNormalization.26 + p2o.Add.14 + p2o.Relu.24
[01/09/2024-07:11:08] [I] 19.77 0.0343 0.0338 1.0 p2o.Conv.28 + p2o.BatchNormalization.28 + p2o.Relu.25
[01/09/2024-07:11:08] [I] 34.77 0.0604 0.0594 1.8 p2o.Conv.29 + p2o.BatchNormalization.29 + p2o.Relu.26
[01/09/2024-07:11:08] [I] 22.41 0.0389 0.0379 1.1 p2o.Conv.30 + p2o.BatchNormalization.30 + p2o.Add.16 + p2o.Relu.27
[01/09/2024-07:11:08] [I] 19.66 0.0341 0.0338 1.0 p2o.Conv.31 + p2o.BatchNormalization.31 + p2o.Relu.28
[01/09/2024-07:11:08] [I] 34.41 0.0597 0.0584 1.7 p2o.Conv.32 + p2o.BatchNormalization.32 + p2o.Relu.29
[01/09/2024-07:11:08] [I] 22.78 0.0396 0.0389 1.1 p2o.Conv.33 + p2o.BatchNormalization.33 + p2o.Add.18 + p2o.Relu.30
[01/09/2024-07:11:08] [I] 19.70 0.0342 0.0338 1.0 p2o.Conv.34 + p2o.BatchNormalization.34 + p2o.Relu.31
[01/09/2024-07:11:08] [I] 34.47 0.0598 0.0584 1.7 p2o.Conv.35 + p2o.BatchNormalization.35 + p2o.Relu.32
[01/09/2024-07:11:08] [I] 22.32 0.0387 0.0379 1.1 p2o.Conv.36 + p2o.BatchNormalization.36 + p2o.Add.20 + p2o.Relu.33
[01/09/2024-07:11:08] [I] 19.64 0.0341 0.0338 1.0 p2o.Conv.37 + p2o.BatchNormalization.37 + p2o.Relu.34
[01/09/2024-07:11:08] [I] 34.51 0.0599 0.0584 1.7 p2o.Conv.38 + p2o.BatchNormalization.38 + p2o.Relu.35
[01/09/2024-07:11:08] [I] 22.79 0.0396 0.0389 1.1 p2o.Conv.39 + p2o.BatchNormalization.39 + p2o.Add.22 + p2o.Relu.36
[01/09/2024-07:11:08] [I] 19.62 0.0341 0.0338 1.0 p2o.Conv.40 + p2o.BatchNormalization.40 + p2o.Relu.37
[01/09/2024-07:11:08] [I] 34.29 0.0595 0.0584 1.7 p2o.Conv.41 + p2o.BatchNormalization.41 + p2o.Relu.38
[01/09/2024-07:11:08] [I] 22.34 0.0388 0.0379 1.1 p2o.Conv.42 + p2o.BatchNormalization.42 + p2o.Add.24 + p2o.Relu.39
[01/09/2024-07:11:08] [I] 28.79 0.0500 0.0492 1.5 p2o.Conv.43 + p2o.BatchNormalization.43 + p2o.Relu.40
[01/09/2024-07:11:08] [I] 29.90 0.0519 0.0512 1.5 p2o.Conv.46 + p2o.BatchNormalization.46
[01/09/2024-07:11:08] [I] 39.18 0.0680 0.0666 2.0 p2o.Conv.44 + p2o.BatchNormalization.44 + p2o.Relu.41
[01/09/2024-07:11:08] [I] 20.80 0.0361 0.0358 1.0 p2o.Conv.45 + p2o.BatchNormalization.45 + p2o.Add.26 + p2o.Relu.42
[01/09/2024-07:11:08] [I] 19.68 0.0342 0.0338 1.0 p2o.Conv.47 + p2o.BatchNormalization.47 + p2o.Relu.43
[01/09/2024-07:11:08] [I] 38.55 0.0669 0.0655 1.9 p2o.Conv.48 + p2o.BatchNormalization.48 + p2o.Relu.44
[01/09/2024-07:11:08] [I] 20.06 0.0348 0.0338 1.0 p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28
[01/09/2024-07:11:08] [I] 11.55 0.0200 0.0184 0.6 Reformatting CopyNode for Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28
[01/09/2024-07:11:08] [I] 1983.58 3.4437 3.1969 100.0 Total
I found that even BF16 flag is set, the chosen kernels for convolution are still in FP32 precision.
Per @nvpohanh 's comment, maybe the FP32 conv kernels are faster than BF16? you can verified this by checking TRT verbose log, in the tactic optimizer part find the kernel name and see how each kernel's perf.
closing since no activity for more than 3 weeks, thanks all!