TensorRT icon indicating copy to clipboard operation
TensorRT copied to clipboard

BF16 is slower than fp16 of TensorRT 9.1 when running my R50 model on A800 GPU

Open zhoutianzi666 opened this issue 1 year ago • 5 comments

Description

Environment

TensorRT Version: TensorRT-9.1.0.4 NVIDIA GPU: A800,3080 NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System: Ubuntu 18.04.5 LTS \n \l Python Version (if applicable): python3.8 Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "1"

import tensorrt as trt
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np

logger = trt.Logger(trt.Logger.WARNING)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
parser = trt.OnnxParser(network, logger)

model_path = "./prune_model.onnx"
precision = "fp16"

success = parser.parse_from_file(model_path)
config = builder.create_builder_config()
if precision == "fp16":
   config.set_flag(trt.BuilderFlag.FP16)
elif precision == "bf16":
    config.set_flag(trt.BuilderFlag.BF16)
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 35)
profile = builder.create_optimization_profile()


input_shape = [6, 465, 720, 3]

profile.set_shape("stack_0.tmp_0", input_shape, input_shape, input_shape)

config.add_optimization_profile(profile)
config.profiling_verbosity = trt.ProfilingVerbosity.DETAILED

engine_file_path = "engine_file_path_" + precision
if os.path.exists(engine_file_path):
    with open(engine_file_path, "rb") as f, trt.Runtime(logger) as runtime:
        engine = runtime.deserialize_cuda_engine(f.read())
else:
    serialized_engine = builder.build_serialized_network(network, config)
    runtime = trt.Runtime(logger)
    engine = runtime.deserialize_cuda_engine(serialized_engine)
    print("save engine for later use.")
    with open(engine_file_path, "wb") as f:
        f.write(engine.serialize())

context = engine.create_execution_context()
context.set_binding_shape(0, input_shape)

h_input0 = cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(0)),dtype=np.float32)
h_input0 = np.zeros(h_input0.shape).astype(np.float32)
h_output =cuda.pagelocked_empty(trt.volume(engine.get_binding_shape(1)),dtype=np.float32)

d_input0 = cuda.mem_alloc(h_input0.nbytes)
d_output = cuda.mem_alloc(h_output.nbytes)

stream = cuda.Stream()
cuda.memcpy_htod_async(d_input0, h_input0, stream)
context.execute_async(bindings=[int(d_input0), int(d_output)], stream_handle=stream.handle)
cuda.memcpy_dtoh_async(h_output, d_output, stream)
stream.synchronize()


import datetime
import time

stream.synchronize()
starttime = datetime.datetime.now()

for i in range(10):
    cuda.memcpy_htod_async(d_input0, h_input0, stream)
    context.execute_async(bindings=[int(d_input0), int(d_output)], stream_handle=stream.handle)
    cuda.memcpy_dtoh_async(h_output, d_output, stream)

stream.synchronize()
endtime = datetime.datetime.now()
duringtime = endtime - starttime
print (duringtime.seconds * 1000 + duringtime.microseconds / 1000.0)# 单位是毫

# fp32 is : 41.957417 -35.9053,  81.156 ms
# bf16 is : 41.957417 -35.9053,  83.132 ms
# fp16 is : 41.98418 -35.900158, 53.892 ms
print(np.std(h_output), np.mean(h_output))

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

zhoutianzi666 avatar Jan 08 '24 06:01 zhoutianzi666

Description

Environment

TensorRT Version:

NVIDIA GPU:

NVIDIA Driver Version:

CUDA Version:

CUDNN Version:

Operating System:

Python Version (if applicable):

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Model link:

Steps To Reproduce

Commands or scripts:

Have you tried the latest release?:

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt):

zhoutianzi666 avatar Jan 08 '24 06:01 zhoutianzi666

@nvpohanh I guess it's expected since we have more optimized kernel for FP16, am I right?

zerollzeng avatar Jan 08 '24 09:01 zerollzeng

Yes, our current BF16 optimizations focus more on Transformers (like LLMs) rather than ConvNets. However, I think this is still something we want to improve in the future. @zerollzeng Could you repro and file an internal tracker? Thanks

nvpohanh avatar Jan 08 '24 10:01 nvpohanh

Hi @zerollzeng @nvpohanh , Our customers are trying to use BF16 precision to reduce accuracy drop but they encounter a perf gap. The following logs are the trtexec information for the model above with BF16 and FP16 precisions. I found that even BF16 flag is set, the chosen kernels for convolution are still in FP32 precision. Also, in building stage, no BF16 convolution kernel is tested as a possible choice (the optimization level is set to be 5). Could you please look into this issue? Thanks!

BF16:

[01/09/2024-07:11:19] [I] Layers:
Name: assign_0.tmp_0, LayerType: Constant, Inputs: [], Outputs: [ { Name: (Unnamed Layer* 1) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Constant, weights: {"Type": "Float", "Count": 3}, dimensions: [1,3,1,1], TacticValue: 0x0000000000000000, StreamId: 0, Metadata: 
Name: assign_1.tmp_0, LayerType: Constant, Inputs: [], Outputs: [ { Name: (Unnamed Layer* 3) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Constant, weights: {"Type": "Float", "Count": 3}, dimensions: [1,3,1,1], TacticValue: 0x0000000000000000, StreamId: 0, Metadata: 
Name: p2o.Transpose.0, LayerType: Shuffle, Inputs: [ { Name: stack_0.tmp_0, Location: Device, Dimensions: [6,465,720,3], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: transpose_0.tmp_0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], ParameterType: Shuffle, FirstTranspose: [0,3,1,2], Reshape: "nbDims=-1", SecondTranspose: [0,1,2,3], ZeroIsPlaceholder: 1, TacticValue: 0x0000000000000000, StreamId: 0, Metadata: [ONNX Layer: p2o.Transpose.0]
Name: PWN(PWN(p2o.Sub.0), PWN(p2o.Div.0)), LayerType: PointWiseV2, Inputs: [ { Name: transpose_0.tmp_0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }, { Name: (Unnamed Layer* 1) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }, { Name: (Unnamed Layer* 3) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: p2o.Div.1, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], ParameterType: PointWise, ParameterSubType: PointWiseExpression, NbInputArgs: 3, InputArgs: ["arg0", "arg1", "arg2"], NbOutputVars: 1, OutputVars: ["var1"], NbParams: 0, Params: [], NbLiterals: 0, Literals: [], NbOperations: 2, Operations: ["auto const var0 = pwgen::iMinus(arg0, arg1);", "auto const var1 = pwgen::iDiv(var0, arg2);"], TacticValue: 0x0000000000000009, StreamId: 0, Metadata: [ONNX Layer: p2o.Sub.0][ONNX Layer: p2o.Div.0]
Name: Reformatting CopyNode for Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, LayerType: Reformat, Inputs: [ { Name: p2o.Div.1, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: Reformatted Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x00000000000003ea, StreamId: 0, Metadata: 
Name: p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, LayerType: CaskConvolution, Inputs: [ { Name: Reformatted Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_0.tmp_0, Location: Device, Dimensions: [6,64,240,368], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [7,7], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [3,3], PostPadding: [18,19], Stride: [2,2], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 9408}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_indexed_wo_smem_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x32x16_stage1_warpsize4x1x1_g1_tensor16x8x8, TacticValue: 0x9cb304e2edbc1221, StreamId: 0, Metadata: [ONNX Layer: p2o.Pad.0][ONNX Layer: p2o.Conv.0][ONNX Layer: p2o.BatchNormalization.0][ONNX Layer: p2o.Relu.0]
Name: p2o.MaxPool.0, LayerType: CaskPooling, Inputs: [ { Name: relu_0.tmp_0, Location: Device, Dimensions: [6,64,240,368], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Pooling, PoolingType: MAX, WindowSize: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], BlendFactor: 0, AverageCountExcludesPadding: 1, TacticName: sm50_xmma_pooling_max_nhwc_FP32FP32_WINDOWSIZE_3_NOT_PROPAGATE_NAN_2D, TacticValue: 0x789b2859f2e03e79, StreamId: 0, Metadata: [ONNX Layer: p2o.MaxPool.0]
Name: p2o.Conv.1 + p2o.BatchNormalization.1 + p2o.Relu.1, LayerType: CaskGemmConvolution, Inputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_1.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 4096}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x64x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x000000000002058d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.1][ONNX Layer: p2o.BatchNormalization.1][ONNX Layer: p2o.Relu.1]
Name: p2o.Conv.2 + p2o.BatchNormalization.2 + p2o.Relu.2, LayerType: CaskConvolution, Inputs: [ { Name: relu_1.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_2.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xa9a06d0633580c0c, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.2][ONNX Layer: p2o.BatchNormalization.2][ONNX Layer: p2o.Relu.2]
Name: p2o.Conv.3 + p2o.BatchNormalization.3, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_2.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: batch_norm_3.tmp_2, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize128x64x16_stage6_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020741, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.3][ONNX Layer: p2o.BatchNormalization.3]
Name: p2o.Conv.4 + p2o.BatchNormalization.4 + p2o.Add.0 + p2o.Relu.3, LayerType: CaskConvolution, Inputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: batch_norm_3.tmp_2, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1, TacticValue: 0x9dece0dc37e90462, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.4][ONNX Layer: p2o.BatchNormalization.4][ONNX Layer: p2o.Add.0][ONNX Layer: p2o.Relu.3]
Name: p2o.Conv.5 + p2o.BatchNormalization.5 + p2o.Relu.4, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_4.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x64x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x000000000002058d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.5][ONNX Layer: p2o.BatchNormalization.5][ONNX Layer: p2o.Relu.4]
Name: p2o.Conv.6 + p2o.BatchNormalization.6 + p2o.Relu.5, LayerType: CaskConvolution, Inputs: [ { Name: relu_4.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_5.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xa9a06d0633580c0c, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.6][ONNX Layer: p2o.BatchNormalization.6][ONNX Layer: p2o.Relu.5]
Name: p2o.Conv.7 + p2o.BatchNormalization.7 + p2o.Add.2 + p2o.Relu.6, LayerType: CaskConvolution, Inputs: [ { Name: relu_5.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1, TacticValue: 0x9dece0dc37e90462, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.7][ONNX Layer: p2o.BatchNormalization.7][ONNX Layer: p2o.Add.2][ONNX Layer: p2o.Relu.6]
Name: p2o.Conv.8 + p2o.BatchNormalization.8 + p2o.Relu.7, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_7.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x64x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x000000000002058d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.8][ONNX Layer: p2o.BatchNormalization.8][ONNX Layer: p2o.Relu.7]
Name: p2o.Conv.9 + p2o.BatchNormalization.9 + p2o.Relu.8, LayerType: CaskConvolution, Inputs: [ { Name: relu_7.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_8.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Float", "Count": 36864}, Bias: {"Type": "Float", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xa9a06d0633580c0c, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.9][ONNX Layer: p2o.BatchNormalization.9][ONNX Layer: p2o.Relu.8]
Name: p2o.Conv.10 + p2o.BatchNormalization.10 + p2o.Add.4 + p2o.Relu.9, LayerType: CaskConvolution, Inputs: [ { Name: relu_8.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 16384}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_simple_t1r1s1, TacticValue: 0x9dece0dc37e90462, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.10][ONNX Layer: p2o.BatchNormalization.10][ONNX Layer: p2o.Add.4][ONNX Layer: p2o.Relu.9]
Name: p2o.Conv.11 + p2o.BatchNormalization.11 + p2o.Relu.10, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_10.tmp_0, Location: Device, Dimensions: [6,128,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 32768}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020764, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.11][ONNX Layer: p2o.BatchNormalization.11][ONNX Layer: p2o.Relu.10]
Name: p2o.Conv.14 + p2o.BatchNormalization.14, LayerType: CaskConvolution, Inputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: batch_norm_14.tmp_2, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 131072}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r1s1, TacticValue: 0xebdd7d350fbaa00e, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.14][ONNX Layer: p2o.BatchNormalization.14]
Name: p2o.Conv.12 + p2o.BatchNormalization.12 + p2o.Relu.11, LayerType: CaskConvolution, Inputs: [ { Name: relu_10.tmp_0, Location: Device, Dimensions: [6,128,120,184], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_11.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.12][ONNX Layer: p2o.BatchNormalization.12][ONNX Layer: p2o.Relu.11]
Name: p2o.Conv.13 + p2o.BatchNormalization.13 + p2o.Add.6 + p2o.Relu.12, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_11.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: batch_norm_14.tmp_2, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000202f7, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.13][ONNX Layer: p2o.BatchNormalization.13][ONNX Layer: p2o.Add.6][ONNX Layer: p2o.Relu.12]
Name: p2o.Conv.15 + p2o.BatchNormalization.15 + p2o.Relu.13, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_13.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.15][ONNX Layer: p2o.BatchNormalization.15][ONNX Layer: p2o.Relu.13]
Name: p2o.Conv.16 + p2o.BatchNormalization.16 + p2o.Relu.14, LayerType: CaskConvolution, Inputs: [ { Name: relu_13.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_14.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.16][ONNX Layer: p2o.BatchNormalization.16][ONNX Layer: p2o.Relu.14]
Name: p2o.Conv.17 + p2o.BatchNormalization.17 + p2o.Add.8 + p2o.Relu.15, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_14.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000202f7, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.17][ONNX Layer: p2o.BatchNormalization.17][ONNX Layer: p2o.Add.8][ONNX Layer: p2o.Relu.15]
Name: p2o.Conv.18 + p2o.BatchNormalization.18 + p2o.Relu.16, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_16.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.18][ONNX Layer: p2o.BatchNormalization.18][ONNX Layer: p2o.Relu.16]
Name: p2o.Conv.19 + p2o.BatchNormalization.19 + p2o.Relu.17, LayerType: CaskConvolution, Inputs: [ { Name: relu_16.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_17.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.19][ONNX Layer: p2o.BatchNormalization.19][ONNX Layer: p2o.Relu.17]
Name: p2o.Conv.20 + p2o.BatchNormalization.20 + p2o.Add.10 + p2o.Relu.18, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_17.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000202f7, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.20][ONNX Layer: p2o.BatchNormalization.20][ONNX Layer: p2o.Add.10][ONNX Layer: p2o.Relu.18]
Name: p2o.Conv.21 + p2o.BatchNormalization.21 + p2o.Relu.19, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_19.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x32_stage5_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.21][ONNX Layer: p2o.BatchNormalization.21][ONNX Layer: p2o.Relu.19]
Name: p2o.Conv.22 + p2o.BatchNormalization.22 + p2o.Relu.20, LayerType: CaskConvolution, Inputs: [ { Name: relu_19.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_20.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Float", "Count": 147456}, Bias: {"Type": "Float", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.22][ONNX Layer: p2o.BatchNormalization.22][ONNX Layer: p2o.Relu.20]
Name: p2o.Conv.23 + p2o.BatchNormalization.23 + p2o.Add.12 + p2o.Relu.21, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_20.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 65536}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000202f7, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.23][ONNX Layer: p2o.BatchNormalization.23][ONNX Layer: p2o.Add.12][ONNX Layer: p2o.Relu.21]
Name: p2o.Conv.24 + p2o.BatchNormalization.24 + p2o.Relu.22, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_22.tmp_0, Location: Device, Dimensions: [6,256,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 131072}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_nn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020764, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.24][ONNX Layer: p2o.BatchNormalization.24][ONNX Layer: p2o.Relu.22]
Name: p2o.Conv.27 + p2o.BatchNormalization.27, LayerType: CaskConvolution, Inputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: batch_norm_27.tmp_2, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 524288}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x16_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r1s1, TacticValue: 0x130df49cb195156b, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.27][ONNX Layer: p2o.BatchNormalization.27]
Name: p2o.Conv.25 + p2o.BatchNormalization.25 + p2o.Relu.23, LayerType: CaskConvolution, Inputs: [ { Name: relu_22.tmp_0, Location: Device, Dimensions: [6,256,60,92], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_23.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.25][ONNX Layer: p2o.BatchNormalization.25][ONNX Layer: p2o.Relu.23]
Name: p2o.Conv.26 + p2o.BatchNormalization.26 + p2o.Add.14 + p2o.Relu.24, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_23.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: batch_norm_27.tmp_2, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.26][ONNX Layer: p2o.BatchNormalization.26][ONNX Layer: p2o.Add.14][ONNX Layer: p2o.Relu.24]
Name: p2o.Conv.28 + p2o.BatchNormalization.28 + p2o.Relu.25, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_25.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.28][ONNX Layer: p2o.BatchNormalization.28][ONNX Layer: p2o.Relu.25]
Name: p2o.Conv.29 + p2o.BatchNormalization.29 + p2o.Relu.26, LayerType: CaskConvolution, Inputs: [ { Name: relu_25.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_26.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.29][ONNX Layer: p2o.BatchNormalization.29][ONNX Layer: p2o.Relu.26]
Name: p2o.Conv.30 + p2o.BatchNormalization.30 + p2o.Add.16 + p2o.Relu.27, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_26.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.30][ONNX Layer: p2o.BatchNormalization.30][ONNX Layer: p2o.Add.16][ONNX Layer: p2o.Relu.27]
Name: p2o.Conv.31 + p2o.BatchNormalization.31 + p2o.Relu.28, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_28.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.31][ONNX Layer: p2o.BatchNormalization.31][ONNX Layer: p2o.Relu.28]
Name: p2o.Conv.32 + p2o.BatchNormalization.32 + p2o.Relu.29, LayerType: CaskConvolution, Inputs: [ { Name: relu_28.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_29.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.32][ONNX Layer: p2o.BatchNormalization.32][ONNX Layer: p2o.Relu.29]
Name: p2o.Conv.33 + p2o.BatchNormalization.33 + p2o.Add.18 + p2o.Relu.30, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_29.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.33][ONNX Layer: p2o.BatchNormalization.33][ONNX Layer: p2o.Add.18][ONNX Layer: p2o.Relu.30]
Name: p2o.Conv.34 + p2o.BatchNormalization.34 + p2o.Relu.31, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_31.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.34][ONNX Layer: p2o.BatchNormalization.34][ONNX Layer: p2o.Relu.31]
Name: p2o.Conv.35 + p2o.BatchNormalization.35 + p2o.Relu.32, LayerType: CaskConvolution, Inputs: [ { Name: relu_31.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_32.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.35][ONNX Layer: p2o.BatchNormalization.35][ONNX Layer: p2o.Relu.32]
Name: p2o.Conv.36 + p2o.BatchNormalization.36 + p2o.Add.20 + p2o.Relu.33, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_32.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.36][ONNX Layer: p2o.BatchNormalization.36][ONNX Layer: p2o.Add.20][ONNX Layer: p2o.Relu.33]
Name: p2o.Conv.37 + p2o.BatchNormalization.37 + p2o.Relu.34, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_34.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.37][ONNX Layer: p2o.BatchNormalization.37][ONNX Layer: p2o.Relu.34]
Name: p2o.Conv.38 + p2o.BatchNormalization.38 + p2o.Relu.35, LayerType: CaskConvolution, Inputs: [ { Name: relu_34.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_35.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.38][ONNX Layer: p2o.BatchNormalization.38][ONNX Layer: p2o.Relu.35]
Name: p2o.Conv.39 + p2o.BatchNormalization.39 + p2o.Add.22 + p2o.Relu.36, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_35.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.39][ONNX Layer: p2o.BatchNormalization.39][ONNX Layer: p2o.Add.22][ONNX Layer: p2o.Relu.36]
Name: p2o.Conv.40 + p2o.BatchNormalization.40 + p2o.Relu.37, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_37.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000000201d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.40][ONNX Layer: p2o.BatchNormalization.40][ONNX Layer: p2o.Relu.37]
Name: p2o.Conv.41 + p2o.BatchNormalization.41 + p2o.Relu.38, LayerType: CaskConvolution, Inputs: [ { Name: relu_37.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_38.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Float", "Count": 589824}, Bias: {"Type": "Float", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x256x32_stage3_warpsize2x4x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0x614e89f7852edbc3, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.41][ONNX Layer: p2o.BatchNormalization.41][ONNX Layer: p2o.Relu.38]
Name: p2o.Conv.42 + p2o.BatchNormalization.42 + p2o.Add.24 + p2o.Relu.39, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_38.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Float", "Count": 262144}, Bias: {"Type": "Float", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x00000000000207cb, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.42][ONNX Layer: p2o.BatchNormalization.42][ONNX Layer: p2o.Add.24][ONNX Layer: p2o.Relu.39]
Name: p2o.Conv.43 + p2o.BatchNormalization.43 + p2o.Relu.40, LayerType: CaskConvolution, Inputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_40.tmp_0, Location: Device, Dimensions: [6,512,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 524288}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8, TacticValue: 0x25b2b9d5c9d5ca0d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.43][ONNX Layer: p2o.BatchNormalization.43][ONNX Layer: p2o.Relu.40]
Name: p2o.Conv.46 + p2o.BatchNormalization.46, LayerType: CaskConvolution, Inputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: batch_norm_46.tmp_2, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Float", "Count": 2097152}, Bias: {"Type": "Float", "Count": 2048}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8, TacticValue: 0x25b2b9d5c9d5ca0d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.46][ONNX Layer: p2o.BatchNormalization.46]
Name: p2o.Conv.44 + p2o.BatchNormalization.44 + p2o.Relu.41, LayerType: CaskConvolution, Inputs: [ { Name: relu_40.tmp_0, Location: Device, Dimensions: [6,512,30,46], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_41.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 2359296}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.44][ONNX Layer: p2o.BatchNormalization.44][ONNX Layer: p2o.Relu.41]
Name: p2o.Conv.45 + p2o.BatchNormalization.45 + p2o.Add.26 + p2o.Relu.42, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_41.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: batch_norm_46.tmp_2, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Float", "Count": 1048576}, Bias: {"Type": "Float", "Count": 2048}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020413, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.45][ONNX Layer: p2o.BatchNormalization.45][ONNX Layer: p2o.Add.26][ONNX Layer: p2o.Relu.42]
Name: p2o.Conv.47 + p2o.BatchNormalization.47 + p2o.Relu.43, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_43.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 1048576}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize128x256x32_stage3_warpsize2x4x1_tensor16x8x8, TacticValue: 0x00000000040601d8, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.47][ONNX Layer: p2o.BatchNormalization.47][ONNX Layer: p2o.Relu.43]
Name: p2o.Conv.48 + p2o.BatchNormalization.48 + p2o.Relu.44, LayerType: CaskConvolution, Inputs: [ { Name: relu_43.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: relu_44.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Float", "Count": 2359296}, Bias: {"Type": "Float", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f32f32_tf32f32_f32_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x8_t1r3s3, TacticValue: 0xd920b33c9bd27143, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.48][ONNX Layer: p2o.BatchNormalization.48][ONNX Layer: p2o.Relu.44]
Name: p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_44.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }, { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: Reformatted Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Float", "Count": 1048576}, Bias: {"Type": "Float", "Count": 2048}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_gemm_f32f32_tf32f32_f32_tn_n_tilesize64x128x16_stage4_warpsize2x2x1_tensor16x8x8, TacticValue: 0x0000000000020413, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.49][ONNX Layer: p2o.BatchNormalization.49][ONNX Layer: p2o.Add.28]
Name: Reformatting CopyNode for Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, LayerType: Reformat, Inputs: [ { Name: Reformatted Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP32 format where channel % 4 == 0 }], Outputs: [ { Name: tmp_16, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Row major linear FP32 }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x0000000000000000, StreamId: 0, Metadata: 

Bindings:
stack_0.tmp_0
tmp_16
[01/09/2024-07:11:19] [I] Starting inference
[01/09/2024-07:11:22] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[01/09/2024-07:11:22] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[01/09/2024-07:11:22] [I] 
[01/09/2024-07:11:22] [I] === Profile (379 iterations ) ===
[01/09/2024-07:11:22] [I]    Time(ms)     Avg.(ms)   Median(ms)   Time(%)   Layer
[01/09/2024-07:11:22] [I]       29.82       0.0787       0.0594       1.2   p2o.Transpose.0
[01/09/2024-07:11:22] [I]       14.51       0.0383       0.0379       0.6   PWN(PWN(p2o.Sub.0), PWN(p2o.Div.0))
[01/09/2024-07:11:22] [I]       33.62       0.0887       0.0532       1.4   Reformatting CopyNode for Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0
[01/09/2024-07:11:22] [I]      145.53       0.3840       0.3768       6.0   p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0
[01/09/2024-07:11:22] [I]       51.87       0.1369       0.1362       2.2   p2o.MaxPool.0
[01/09/2024-07:11:22] [I]       22.14       0.0584       0.0573       0.9   p2o.Conv.1 + p2o.BatchNormalization.1 + p2o.Relu.1
[01/09/2024-07:11:22] [I]       52.72       0.1391       0.1372       2.2   p2o.Conv.2 + p2o.BatchNormalization.2 + p2o.Relu.2
[01/09/2024-07:11:22] [I]       52.90       0.1396       0.1382       2.2   p2o.Conv.3 + p2o.BatchNormalization.3
[01/09/2024-07:11:22] [I]       93.40       0.2464       0.2458       3.9   p2o.Conv.4 + p2o.BatchNormalization.4 + p2o.Add.0 + p2o.Relu.3
[01/09/2024-07:11:22] [I]       53.38       0.1409       0.1403       2.2   p2o.Conv.5 + p2o.BatchNormalization.5 + p2o.Relu.4
[01/09/2024-07:11:22] [I]       50.95       0.1344       0.1321       2.1   p2o.Conv.6 + p2o.BatchNormalization.6 + p2o.Relu.5
[01/09/2024-07:11:22] [I]       92.66       0.2445       0.2437       3.8   p2o.Conv.7 + p2o.BatchNormalization.7 + p2o.Add.2 + p2o.Relu.6
[01/09/2024-07:11:22] [I]       53.67       0.1416       0.1413       2.2   p2o.Conv.8 + p2o.BatchNormalization.8 + p2o.Relu.7
[01/09/2024-07:11:22] [I]       51.75       0.1366       0.1341       2.1   p2o.Conv.9 + p2o.BatchNormalization.9 + p2o.Relu.8
[01/09/2024-07:11:22] [I]       92.73       0.2447       0.2437       3.8   p2o.Conv.10 + p2o.BatchNormalization.10 + p2o.Add.4 + p2o.Relu.9
[01/09/2024-07:11:22] [I]       62.96       0.1661       0.1659       2.6   p2o.Conv.11 + p2o.BatchNormalization.11 + p2o.Relu.10
[01/09/2024-07:11:22] [I]       46.53       0.1228       0.1208       1.9   p2o.Conv.14 + p2o.BatchNormalization.14
[01/09/2024-07:11:22] [I]       49.00       0.1293       0.1280       2.0   p2o.Conv.12 + p2o.BatchNormalization.12 + p2o.Relu.11
[01/09/2024-07:11:22] [I]       46.42       0.1225       0.1219       1.9   p2o.Conv.13 + p2o.BatchNormalization.13 + p2o.Add.6 + p2o.Relu.12
[01/09/2024-07:11:22] [I]       33.97       0.0896       0.0891       1.4   p2o.Conv.15 + p2o.BatchNormalization.15 + p2o.Relu.13
[01/09/2024-07:11:22] [I]       43.85       0.1157       0.1137       1.8   p2o.Conv.16 + p2o.BatchNormalization.16 + p2o.Relu.14
[01/09/2024-07:11:22] [I]       46.76       0.1234       0.1229       1.9   p2o.Conv.17 + p2o.BatchNormalization.17 + p2o.Add.8 + p2o.Relu.15
[01/09/2024-07:11:22] [I]       33.86       0.0893       0.0891       1.4   p2o.Conv.18 + p2o.BatchNormalization.18 + p2o.Relu.16
[01/09/2024-07:11:22] [I]       43.69       0.1153       0.1137       1.8   p2o.Conv.19 + p2o.BatchNormalization.19 + p2o.Relu.17
[01/09/2024-07:11:22] [I]       47.06       0.1242       0.1239       2.0   p2o.Conv.20 + p2o.BatchNormalization.20 + p2o.Add.10 + p2o.Relu.18
[01/09/2024-07:11:22] [I]       33.97       0.0896       0.0891       1.4   p2o.Conv.21 + p2o.BatchNormalization.21 + p2o.Relu.19
[01/09/2024-07:11:22] [I]       43.81       0.1156       0.1137       1.8   p2o.Conv.22 + p2o.BatchNormalization.22 + p2o.Relu.20
[01/09/2024-07:11:22] [I]       46.72       0.1233       0.1229       1.9   p2o.Conv.23 + p2o.BatchNormalization.23 + p2o.Add.12 + p2o.Relu.21
[01/09/2024-07:11:22] [I]       42.52       0.1122       0.1106       1.8   p2o.Conv.24 + p2o.BatchNormalization.24 + p2o.Relu.22
[01/09/2024-07:11:22] [I]       39.25       0.1036       0.1024       1.6   p2o.Conv.27 + p2o.BatchNormalization.27
[01/09/2024-07:11:22] [I]       48.27       0.1274       0.1249       2.0   p2o.Conv.25 + p2o.BatchNormalization.25 + p2o.Relu.23
[01/09/2024-07:11:22] [I]       30.11       0.0794       0.0788       1.2   p2o.Conv.26 + p2o.BatchNormalization.26 + p2o.Add.14 + p2o.Relu.24
[01/09/2024-07:11:22] [I]       24.57       0.0648       0.0635       1.0   p2o.Conv.28 + p2o.BatchNormalization.28 + p2o.Relu.25
[01/09/2024-07:11:22] [I]       48.16       0.1271       0.1249       2.0   p2o.Conv.29 + p2o.BatchNormalization.29 + p2o.Relu.26
[01/09/2024-07:11:22] [I]       30.39       0.0802       0.0788       1.3   p2o.Conv.30 + p2o.BatchNormalization.30 + p2o.Add.16 + p2o.Relu.27
[01/09/2024-07:11:22] [I]       24.61       0.0649       0.0635       1.0   p2o.Conv.31 + p2o.BatchNormalization.31 + p2o.Relu.28
[01/09/2024-07:11:22] [I]       48.12       0.1270       0.1249       2.0   p2o.Conv.32 + p2o.BatchNormalization.32 + p2o.Relu.29
[01/09/2024-07:11:22] [I]       30.40       0.0802       0.0799       1.3   p2o.Conv.33 + p2o.BatchNormalization.33 + p2o.Add.18 + p2o.Relu.30
[01/09/2024-07:11:22] [I]       24.59       0.0649       0.0635       1.0   p2o.Conv.34 + p2o.BatchNormalization.34 + p2o.Relu.31
[01/09/2024-07:11:22] [I]       48.12       0.1270       0.1249       2.0   p2o.Conv.35 + p2o.BatchNormalization.35 + p2o.Relu.32
[01/09/2024-07:11:22] [I]       30.34       0.0800       0.0788       1.3   p2o.Conv.36 + p2o.BatchNormalization.36 + p2o.Add.20 + p2o.Relu.33
[01/09/2024-07:11:22] [I]       24.57       0.0648       0.0635       1.0   p2o.Conv.37 + p2o.BatchNormalization.37 + p2o.Relu.34
[01/09/2024-07:11:22] [I]       48.11       0.1270       0.1249       2.0   p2o.Conv.38 + p2o.BatchNormalization.38 + p2o.Relu.35
[01/09/2024-07:11:22] [I]       30.51       0.0805       0.0799       1.3   p2o.Conv.39 + p2o.BatchNormalization.39 + p2o.Add.22 + p2o.Relu.36
[01/09/2024-07:11:22] [I]       24.56       0.0648       0.0635       1.0   p2o.Conv.40 + p2o.BatchNormalization.40 + p2o.Relu.37
[01/09/2024-07:11:22] [I]       48.12       0.1270       0.1249       2.0   p2o.Conv.41 + p2o.BatchNormalization.41 + p2o.Relu.38
[01/09/2024-07:11:22] [I]       30.36       0.0801       0.0788       1.3   p2o.Conv.42 + p2o.BatchNormalization.42 + p2o.Add.24 + p2o.Relu.39
[01/09/2024-07:11:22] [I]       40.96       0.1081       0.1065       1.7   p2o.Conv.43 + p2o.BatchNormalization.43 + p2o.Relu.40
[01/09/2024-07:11:22] [I]       40.40       0.1066       0.1044       1.7   p2o.Conv.46 + p2o.BatchNormalization.46
[01/09/2024-07:11:22] [I]       48.99       0.1292       0.1270       2.0   p2o.Conv.44 + p2o.BatchNormalization.44 + p2o.Relu.41
[01/09/2024-07:11:22] [I]       25.46       0.0672       0.0666       1.1   p2o.Conv.45 + p2o.BatchNormalization.45 + p2o.Add.26 + p2o.Relu.42
[01/09/2024-07:11:22] [I]       23.85       0.0629       0.0614       1.0   p2o.Conv.47 + p2o.BatchNormalization.47 + p2o.Relu.43
[01/09/2024-07:11:22] [I]       48.82       0.1288       0.1260       2.0   p2o.Conv.48 + p2o.BatchNormalization.48 + p2o.Relu.44
[01/09/2024-07:11:22] [I]       25.76       0.0680       0.0666       1.1   p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28
[01/09/2024-07:11:22] [I]        8.52       0.0225       0.0215       0.4   Reformatting CopyNode for Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28
[01/09/2024-07:11:22] [I]     2408.69       6.3554       6.2280     100.0   Total

FP16

[01/09/2024-07:11:05] [I] Layers:
Name: assign_0.tmp_0, LayerType: Constant, Inputs: [], Outputs: [ { Name: (Unnamed Layer* 1) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Constant, weights: {"Type": "Float", "Count": 3}, dimensions: [1,3,1,1], TacticValue: 0x0000000000000000, StreamId: 0, Metadata: 
Name: assign_1.tmp_0, LayerType: Constant, Inputs: [], Outputs: [ { Name: (Unnamed Layer* 3) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], ParameterType: Constant, weights: {"Type": "Float", "Count": 3}, dimensions: [1,3,1,1], TacticValue: 0x0000000000000000, StreamId: 0, Metadata: 
Name: p2o.Transpose.0, LayerType: Shuffle, Inputs: [ { Name: stack_0.tmp_0, Location: Device, Dimensions: [6,465,720,3], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: transpose_0.tmp_0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], ParameterType: Shuffle, FirstTranspose: [0,3,1,2], Reshape: "nbDims=-1", SecondTranspose: [0,1,2,3], ZeroIsPlaceholder: 1, TacticValue: 0x0000000000000000, StreamId: 0, Metadata: [ONNX Layer: p2o.Transpose.0]
Name: PWN(PWN(p2o.Sub.0), PWN(p2o.Div.0)), LayerType: PointWiseV2, Inputs: [ { Name: transpose_0.tmp_0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }, { Name: (Unnamed Layer* 1) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }, { Name: (Unnamed Layer* 3) [Constant]_output, Location: Device, Dimensions: [1,3,1,1], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: p2o.Div.1, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], ParameterType: PointWise, ParameterSubType: PointWiseExpression, NbInputArgs: 3, InputArgs: ["arg0", "arg1", "arg2"], NbOutputVars: 1, OutputVars: ["var1"], NbParams: 0, Params: [], NbLiterals: 0, Literals: [], NbOperations: 2, Operations: ["auto const var0 = pwgen::iMinus(arg0, arg1);", "auto const var1 = pwgen::iDiv(var0, arg2);"], TacticValue: 0x0000000000000009, StreamId: 0, Metadata: [ONNX Layer: p2o.Sub.0][ONNX Layer: p2o.Div.0]
Name: Reformatting CopyNode for Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, LayerType: Reformat, Inputs: [ { Name: p2o.Div.1, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Row major linear FP32 }], Outputs: [ { Name: Reformatted Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Channel major FP16 format where channel % 4 == 0 }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x00000000000003ea, StreamId: 0, Metadata: 
Name: p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, LayerType: CaskConvolution, Inputs: [ { Name: Reformatted Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0, Location: Device, Dimensions: [6,3,465,720], Format/Datatype: Channel major FP16 format where channel % 4 == 0 }], Outputs: [ { Name: relu_0.tmp_0, Location: Device, Dimensions: [6,64,240,368], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [7,7], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [3,3], PostPadding: [18,19], Stride: [2,2], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 9408}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_image_first_layer_f16f16_f32_f16_nhwckrsc_nhwc_hmma_k64c4r7s7_stride2x2_tile16x64x64_tensor1688, TacticValue: 0x4341b9cbb7197a9b, StreamId: 0, Metadata: [ONNX Layer: p2o.Pad.0][ONNX Layer: p2o.Conv.0][ONNX Layer: p2o.BatchNormalization.0][ONNX Layer: p2o.Relu.0]
Name: p2o.MaxPool.0, LayerType: CaskPooling, Inputs: [ { Name: relu_0.tmp_0, Location: Device, Dimensions: [6,64,240,368], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Pooling, PoolingType: MAX, WindowSize: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], BlendFactor: 0, AverageCountExcludesPadding: 1, TacticName: sm50_xmma_pooling_coalescedC_NHWC_kMAX_3_False, TacticValue: 0xdb415cba6b0e9137, StreamId: 0, Metadata: [ONNX Layer: p2o.MaxPool.0]
Name: p2o.Conv.1 + p2o.BatchNormalization.1 + p2o.Relu.1, LayerType: CaskGemmConvolution, Inputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_1.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 4096}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x64x64_stage3_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020164, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.1][ONNX Layer: p2o.BatchNormalization.1][ONNX Layer: p2o.Relu.1]
Name: p2o.Conv.2 + p2o.BatchNormalization.2 + p2o.Relu.2, LayerType: CaskConvolution, Inputs: [ { Name: relu_1.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_2.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 36864}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x1x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x529f4431bdae94f5, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.2][ONNX Layer: p2o.BatchNormalization.2][ONNX Layer: p2o.Relu.2]
Name: p2o.Conv.3 + p2o.BatchNormalization.3, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_2.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: batch_norm_3.tmp_2, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.3][ONNX Layer: p2o.BatchNormalization.3]
Name: p2o.Conv.4 + p2o.BatchNormalization.4 + p2o.Add.0 + p2o.Relu.3, LayerType: CaskGemmConvolution, Inputs: [ { Name: p2o.MaxPool.1, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: batch_norm_3.tmp_2, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.4][ONNX Layer: p2o.BatchNormalization.4][ONNX Layer: p2o.Add.0][ONNX Layer: p2o.Relu.3]
Name: p2o.Conv.5 + p2o.BatchNormalization.5 + p2o.Relu.4, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_4.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.5][ONNX Layer: p2o.BatchNormalization.5][ONNX Layer: p2o.Relu.4]
Name: p2o.Conv.6 + p2o.BatchNormalization.6 + p2o.Relu.5, LayerType: CaskConvolution, Inputs: [ { Name: relu_4.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_5.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 36864}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x1x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x529f4431bdae94f5, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.6][ONNX Layer: p2o.BatchNormalization.6][ONNX Layer: p2o.Relu.5]
Name: p2o.Conv.7 + p2o.BatchNormalization.7 + p2o.Add.2 + p2o.Relu.6, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_5.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_3.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.7][ONNX Layer: p2o.BatchNormalization.7][ONNX Layer: p2o.Add.2][ONNX Layer: p2o.Relu.6]
Name: p2o.Conv.8 + p2o.BatchNormalization.8 + p2o.Relu.7, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_7.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.8][ONNX Layer: p2o.BatchNormalization.8][ONNX Layer: p2o.Relu.7]
Name: p2o.Conv.9 + p2o.BatchNormalization.9 + p2o.Relu.8, LayerType: CaskConvolution, Inputs: [ { Name: relu_7.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_8.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 64, Groups: 1, Weights: {"Type": "Half", "Count": 36864}, Bias: {"Type": "Half", "Count": 64}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize256x64x32_stage3_warpsize4x1x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x529f4431bdae94f5, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.9][ONNX Layer: p2o.BatchNormalization.9][ONNX Layer: p2o.Relu.8]
Name: p2o.Conv.10 + p2o.BatchNormalization.10 + p2o.Add.4 + p2o.Relu.9, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_8.tmp_0, Location: Device, Dimensions: [6,64,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_6.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 16384}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize256x64x32_stage3_warpsize4x1x1_tensor16x8x16, TacticValue: 0x000000000002066d, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.10][ONNX Layer: p2o.BatchNormalization.10][ONNX Layer: p2o.Add.4][ONNX Layer: p2o.Relu.9]
Name: p2o.Conv.11 + p2o.BatchNormalization.11 + p2o.Relu.10, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_10.tmp_0, Location: Device, Dimensions: [6,128,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 32768}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize64x128x32_stage5_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020435, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.11][ONNX Layer: p2o.BatchNormalization.11][ONNX Layer: p2o.Relu.10]
Name: p2o.Conv.14 + p2o.BatchNormalization.14, LayerType: CaskConvolution, Inputs: [ { Name: relu_9.tmp_0, Location: Device, Dimensions: [6,256,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: batch_norm_14.tmp_2, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 131072}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r1s1, TacticValue: 0xea50b6d3d87bf5dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.14][ONNX Layer: p2o.BatchNormalization.14]
Name: p2o.Conv.12 + p2o.BatchNormalization.12 + p2o.Relu.11, LayerType: CaskConvolution, Inputs: [ { Name: relu_10.tmp_0, Location: Device, Dimensions: [6,128,120,184], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_11.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 147456}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16, TacticValue: 0xdfa020ef435ef810, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.12][ONNX Layer: p2o.BatchNormalization.12][ONNX Layer: p2o.Relu.11]
Name: p2o.Conv.13 + p2o.BatchNormalization.13 + p2o.Add.6 + p2o.Relu.12, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_11.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: batch_norm_14.tmp_2, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x00000000000207fa, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.13][ONNX Layer: p2o.BatchNormalization.13][ONNX Layer: p2o.Add.6][ONNX Layer: p2o.Relu.12]
Name: p2o.Conv.15 + p2o.BatchNormalization.15 + p2o.Relu.13, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_13.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_tn_v1, TacticValue: 0x0000000000020848, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.15][ONNX Layer: p2o.BatchNormalization.15][ONNX Layer: p2o.Relu.13]
Name: p2o.Conv.16 + p2o.BatchNormalization.16 + p2o.Relu.14, LayerType: CaskConvolution, Inputs: [ { Name: relu_13.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_14.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 147456}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x60c3421152ef8e10, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.16][ONNX Layer: p2o.BatchNormalization.16][ONNX Layer: p2o.Relu.14]
Name: p2o.Conv.17 + p2o.BatchNormalization.17 + p2o.Add.8 + p2o.Relu.15, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_14.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_12.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x00000000000207fa, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.17][ONNX Layer: p2o.BatchNormalization.17][ONNX Layer: p2o.Add.8][ONNX Layer: p2o.Relu.15]
Name: p2o.Conv.18 + p2o.BatchNormalization.18 + p2o.Relu.16, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_16.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_tn_v1, TacticValue: 0x0000000000020848, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.18][ONNX Layer: p2o.BatchNormalization.18][ONNX Layer: p2o.Relu.16]
Name: p2o.Conv.19 + p2o.BatchNormalization.19 + p2o.Relu.17, LayerType: CaskConvolution, Inputs: [ { Name: relu_16.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_17.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 147456}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x60c3421152ef8e10, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.19][ONNX Layer: p2o.BatchNormalization.19][ONNX Layer: p2o.Relu.17]
Name: p2o.Conv.20 + p2o.BatchNormalization.20 + p2o.Add.10 + p2o.Relu.18, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_17.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_15.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x00000000000207fa, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.20][ONNX Layer: p2o.BatchNormalization.20][ONNX Layer: p2o.Add.10][ONNX Layer: p2o.Relu.18]
Name: p2o.Conv.21 + p2o.BatchNormalization.21 + p2o.Relu.19, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_19.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_tn_v1, TacticValue: 0x0000000000020848, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.21][ONNX Layer: p2o.BatchNormalization.21][ONNX Layer: p2o.Relu.19]
Name: p2o.Conv.22 + p2o.BatchNormalization.22 + p2o.Relu.20, LayerType: CaskConvolution, Inputs: [ { Name: relu_19.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_20.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 128, Groups: 1, Weights: {"Type": "Half", "Count": 147456}, Bias: {"Type": "Half", "Count": 128}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x60c3421152ef8e10, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.22][ONNX Layer: p2o.BatchNormalization.22][ONNX Layer: p2o.Relu.20]
Name: p2o.Conv.23 + p2o.BatchNormalization.23 + p2o.Add.12 + p2o.Relu.21, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_20.tmp_0, Location: Device, Dimensions: [6,128,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_18.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 65536}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_nn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x00000000000207fa, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.23][ONNX Layer: p2o.BatchNormalization.23][ONNX Layer: p2o.Add.12][ONNX Layer: p2o.Relu.21]
Name: p2o.Conv.24 + p2o.BatchNormalization.24 + p2o.Relu.22, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_22.tmp_0, Location: Device, Dimensions: [6,256,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 131072}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.24][ONNX Layer: p2o.BatchNormalization.24][ONNX Layer: p2o.Relu.22]
Name: p2o.Conv.27 + p2o.BatchNormalization.27, LayerType: CaskConvolution, Inputs: [ { Name: relu_21.tmp_0, Location: Device, Dimensions: [6,512,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: batch_norm_27.tmp_2, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 524288}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r1s1, TacticValue: 0xea50b6d3d87bf5dd, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.27][ONNX Layer: p2o.BatchNormalization.27]
Name: p2o.Conv.25 + p2o.BatchNormalization.25 + p2o.Relu.23, LayerType: CaskConvolution, Inputs: [ { Name: relu_22.tmp_0, Location: Device, Dimensions: [6,256,60,92], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_23.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x64x32_stage5_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0xb4bec086187edcfc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.25][ONNX Layer: p2o.BatchNormalization.25][ONNX Layer: p2o.Relu.23]
Name: p2o.Conv.26 + p2o.BatchNormalization.26 + p2o.Add.14 + p2o.Relu.24, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_23.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: batch_norm_27.tmp_2, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.26][ONNX Layer: p2o.BatchNormalization.26][ONNX Layer: p2o.Add.14][ONNX Layer: p2o.Relu.24]
Name: p2o.Conv.28 + p2o.BatchNormalization.28 + p2o.Relu.25, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_25.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.28][ONNX Layer: p2o.BatchNormalization.28][ONNX Layer: p2o.Relu.25]
Name: p2o.Conv.29 + p2o.BatchNormalization.29 + p2o.Relu.26, LayerType: CaskConvolution, Inputs: [ { Name: relu_25.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_26.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.29][ONNX Layer: p2o.BatchNormalization.29][ONNX Layer: p2o.Relu.26]
Name: p2o.Conv.30 + p2o.BatchNormalization.30 + p2o.Add.16 + p2o.Relu.27, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_26.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_24.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.30][ONNX Layer: p2o.BatchNormalization.30][ONNX Layer: p2o.Add.16][ONNX Layer: p2o.Relu.27]
Name: p2o.Conv.31 + p2o.BatchNormalization.31 + p2o.Relu.28, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_28.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.31][ONNX Layer: p2o.BatchNormalization.31][ONNX Layer: p2o.Relu.28]
Name: p2o.Conv.32 + p2o.BatchNormalization.32 + p2o.Relu.29, LayerType: CaskConvolution, Inputs: [ { Name: relu_28.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_29.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.32][ONNX Layer: p2o.BatchNormalization.32][ONNX Layer: p2o.Relu.29]
Name: p2o.Conv.33 + p2o.BatchNormalization.33 + p2o.Add.18 + p2o.Relu.30, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_29.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_27.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.33][ONNX Layer: p2o.BatchNormalization.33][ONNX Layer: p2o.Add.18][ONNX Layer: p2o.Relu.30]
Name: p2o.Conv.34 + p2o.BatchNormalization.34 + p2o.Relu.31, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_31.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.34][ONNX Layer: p2o.BatchNormalization.34][ONNX Layer: p2o.Relu.31]
Name: p2o.Conv.35 + p2o.BatchNormalization.35 + p2o.Relu.32, LayerType: CaskConvolution, Inputs: [ { Name: relu_31.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_32.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.35][ONNX Layer: p2o.BatchNormalization.35][ONNX Layer: p2o.Relu.32]
Name: p2o.Conv.36 + p2o.BatchNormalization.36 + p2o.Add.20 + p2o.Relu.33, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_32.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_30.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.36][ONNX Layer: p2o.BatchNormalization.36][ONNX Layer: p2o.Add.20][ONNX Layer: p2o.Relu.33]
Name: p2o.Conv.37 + p2o.BatchNormalization.37 + p2o.Relu.34, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_34.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.37][ONNX Layer: p2o.BatchNormalization.37][ONNX Layer: p2o.Relu.34]
Name: p2o.Conv.38 + p2o.BatchNormalization.38 + p2o.Relu.35, LayerType: CaskConvolution, Inputs: [ { Name: relu_34.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_35.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.38][ONNX Layer: p2o.BatchNormalization.38][ONNX Layer: p2o.Relu.35]
Name: p2o.Conv.39 + p2o.BatchNormalization.39 + p2o.Add.22 + p2o.Relu.36, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_35.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_33.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.39][ONNX Layer: p2o.BatchNormalization.39][ONNX Layer: p2o.Add.22][ONNX Layer: p2o.Relu.36]
Name: p2o.Conv.40 + p2o.BatchNormalization.40 + p2o.Relu.37, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_37.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x64_ldg8_relu_stages_32x6_tn_v1, TacticValue: 0x00000000000204b4, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.40][ONNX Layer: p2o.BatchNormalization.40][ONNX Layer: p2o.Relu.37]
Name: p2o.Conv.41 + p2o.BatchNormalization.41 + p2o.Relu.38, LayerType: CaskConvolution, Inputs: [ { Name: relu_37.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_38.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 256, Groups: 1, Weights: {"Type": "Half", "Count": 589824}, Bias: {"Type": "Half", "Count": 256}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.41][ONNX Layer: p2o.BatchNormalization.41][ONNX Layer: p2o.Relu.38]
Name: p2o.Conv.42 + p2o.BatchNormalization.42 + p2o.Add.24 + p2o.Relu.39, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_38.tmp_0, Location: Device, Dimensions: [6,256,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_36.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 1024, Groups: 1, Weights: {"Type": "Half", "Count": 262144}, Bias: {"Type": "Half", "Count": 1024}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.42][ONNX Layer: p2o.BatchNormalization.42][ONNX Layer: p2o.Add.24][ONNX Layer: p2o.Relu.39]
Name: p2o.Conv.43 + p2o.BatchNormalization.43 + p2o.Relu.40, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_40.tmp_0, Location: Device, Dimensions: [6,512,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 524288}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.43][ONNX Layer: p2o.BatchNormalization.43][ONNX Layer: p2o.Relu.40]
Name: p2o.Conv.46 + p2o.BatchNormalization.46, LayerType: CaskConvolution, Inputs: [ { Name: relu_39.tmp_0, Location: Device, Dimensions: [6,1024,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: batch_norm_46.tmp_2, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [2,2], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Half", "Count": 2097152}, Bias: {"Type": "Half", "Count": 2048}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16, TacticValue: 0xdfa020ef435ef810, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.46][ONNX Layer: p2o.BatchNormalization.46]
Name: p2o.Conv.44 + p2o.BatchNormalization.44 + p2o.Relu.41, LayerType: CaskConvolution, Inputs: [ { Name: relu_40.tmp_0, Location: Device, Dimensions: [6,512,30,46], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_41.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [2,2], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 2359296}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize128x128x32_stage4_warpsize2x2x1_g1_tensor16x8x16_t1r3s3, TacticValue: 0x60c3421152ef8e10, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.44][ONNX Layer: p2o.BatchNormalization.44][ONNX Layer: p2o.Relu.41]
Name: p2o.Conv.45 + p2o.BatchNormalization.45 + p2o.Add.26 + p2o.Relu.42, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_41.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: batch_norm_46.tmp_2, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Half", "Count": 1048576}, Bias: {"Type": "Half", "Count": 2048}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_nn_v1, TacticValue: 0x00000000000208da, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.45][ONNX Layer: p2o.BatchNormalization.45][ONNX Layer: p2o.Add.26][ONNX Layer: p2o.Relu.42]
Name: p2o.Conv.47 + p2o.BatchNormalization.47 + p2o.Relu.43, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_43.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 1048576}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_gemm_f16f16_f16f16_f16_tn_n_tilesize128x128x32_stage4_warpsize2x2x1_tensor16x8x16, TacticValue: 0x0000000000020678, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.47][ONNX Layer: p2o.BatchNormalization.47][ONNX Layer: p2o.Relu.43]
Name: p2o.Conv.48 + p2o.BatchNormalization.48 + p2o.Relu.44, LayerType: CaskConvolution, Inputs: [ { Name: relu_43.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: relu_44.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [3,3], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [1,1], PostPadding: [1,1], Stride: [1,1], Dilation: [1,1], OutMaps: 512, Groups: 1, Weights: {"Type": "Half", "Count": 2359296}, Bias: {"Type": "Half", "Count": 512}, HasBias: 1, HasReLU: 1, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 0, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: RELU, TacticName: sm80_xmma_fprop_implicit_gemm_f16f16_f16f16_f16_nhwckrsc_nhwc_tilesize64x64x64_stage4_warpsize2x1x2_g1_tensor16x8x16_aACCESS, TacticValue: 0x841c601dec2a75bc, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.48][ONNX Layer: p2o.BatchNormalization.48][ONNX Layer: p2o.Relu.44]
Name: p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, LayerType: CaskGemmConvolution, Inputs: [ { Name: relu_44.tmp_0, Location: Device, Dimensions: [6,512,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }, { Name: relu_42.tmp_0, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: Reformatted Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], ParameterType: Convolution, Kernel: [1,1], PaddingMode: kEXPLICIT_ROUND_DOWN, PrePadding: [0,0], PostPadding: [0,0], Stride: [1,1], Dilation: [1,1], OutMaps: 2048, Groups: 1, Weights: {"Type": "Half", "Count": 1048576}, Bias: {"Type": "Half", "Count": 2048}, HasBias: 1, HasReLU: 0, HasSparseWeights: 0, HasDynamicFilter: 0, HasDynamicBias: 0, HasResidual: 1, ConvXAsActInputIdx: -1, BiasAsActInputIdx: -1, ResAsActInputIdx: -1, Activation: NONE, TacticName: ampere_h16816gemm_128x128_ldg8_relu_stages_32x5_nn_v1, TacticValue: 0x00000000000208da, StreamId: 0, Metadata: [ONNX Layer: p2o.Conv.49][ONNX Layer: p2o.BatchNormalization.49][ONNX Layer: p2o.Add.28]
Name: Reformatting CopyNode for Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, LayerType: Reformat, Inputs: [ { Name: Reformatted Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Channel major FP16 format where channel % 8 == 0 }], Outputs: [ { Name: tmp_16, Location: Device, Dimensions: [6,2048,15,23], Format/Datatype: Row major linear FP32 }], ParameterType: Reformat, Origin: REFORMAT, TacticValue: 0x0000000000000000, StreamId: 0, Metadata: 

Bindings:
stack_0.tmp_0
tmp_16
[01/09/2024-07:11:05] [I] Starting inference
[01/09/2024-07:11:08] [I] The e2e network timing is not reported since it is inaccurate due to the extra synchronizations when the profiler is enabled.
[01/09/2024-07:11:08] [I] To show e2e network timing report, add --separateProfileRun to profile layer timing in a separate run or remove --dumpProfile to disable the profiler.
[01/09/2024-07:11:08] [I] 
[01/09/2024-07:11:08] [I] === Profile (576 iterations ) ===
[01/09/2024-07:11:08] [I]    Time(ms)     Avg.(ms)   Median(ms)   Time(%)   Layer
[01/09/2024-07:11:08] [I]       41.69       0.0724       0.0594       2.1   p2o.Transpose.0
[01/09/2024-07:11:08] [I]       21.83       0.0379       0.0379       1.1   PWN(PWN(p2o.Sub.0), PWN(p2o.Div.0))
[01/09/2024-07:11:08] [I]       25.89       0.0449       0.0410       1.3   Reformatting CopyNode for Input Tensor 0 to p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0
[01/09/2024-07:11:08] [I]       71.08       0.1234       0.1208       3.6   p2o.Pad.0 + p2o.Conv.0 + p2o.BatchNormalization.0 + p2o.Relu.0
[01/09/2024-07:11:08] [I]       46.31       0.0804       0.0799       2.3   p2o.MaxPool.0
[01/09/2024-07:11:08] [I]       17.39       0.0302       0.0297       0.9   p2o.Conv.1 + p2o.BatchNormalization.1 + p2o.Relu.1
[01/09/2024-07:11:08] [I]       40.48       0.0703       0.0686       2.0   p2o.Conv.2 + p2o.BatchNormalization.2 + p2o.Relu.2
[01/09/2024-07:11:08] [I]       41.80       0.0726       0.0717       2.1   p2o.Conv.3 + p2o.BatchNormalization.3
[01/09/2024-07:11:08] [I]       73.15       0.1270       0.1260       3.7   p2o.Conv.4 + p2o.BatchNormalization.4 + p2o.Add.0 + p2o.Relu.3
[01/09/2024-07:11:08] [I]       44.81       0.0778       0.0768       2.3   p2o.Conv.5 + p2o.BatchNormalization.5 + p2o.Relu.4
[01/09/2024-07:11:08] [I]       43.14       0.0749       0.0737       2.2   p2o.Conv.6 + p2o.BatchNormalization.6 + p2o.Relu.5
[01/09/2024-07:11:08] [I]       71.59       0.1243       0.1239       3.6   p2o.Conv.7 + p2o.BatchNormalization.7 + p2o.Add.2 + p2o.Relu.6
[01/09/2024-07:11:08] [I]       44.82       0.0778       0.0778       2.3   p2o.Conv.8 + p2o.BatchNormalization.8 + p2o.Relu.7
[01/09/2024-07:11:08] [I]       42.68       0.0741       0.0727       2.2   p2o.Conv.9 + p2o.BatchNormalization.9 + p2o.Relu.8
[01/09/2024-07:11:08] [I]       71.35       0.1239       0.1229       3.6   p2o.Conv.10 + p2o.BatchNormalization.10 + p2o.Add.4 + p2o.Relu.9
[01/09/2024-07:11:08] [I]       54.45       0.0945       0.0922       2.7   p2o.Conv.11 + p2o.BatchNormalization.11 + p2o.Relu.10
[01/09/2024-07:11:08] [I]       36.97       0.0642       0.0635       1.9   p2o.Conv.14 + p2o.BatchNormalization.14
[01/09/2024-07:11:08] [I]       42.44       0.0737       0.0727       2.1   p2o.Conv.12 + p2o.BatchNormalization.12 + p2o.Relu.11
[01/09/2024-07:11:08] [I]       38.67       0.0671       0.0666       1.9   p2o.Conv.13 + p2o.BatchNormalization.13 + p2o.Add.6 + p2o.Relu.12
[01/09/2024-07:11:08] [I]      128.37       0.2229       0.0440       6.5   p2o.Conv.15 + p2o.BatchNormalization.15 + p2o.Relu.13
[01/09/2024-07:11:08] [I]       33.04       0.0574       0.0563       1.7   p2o.Conv.16 + p2o.BatchNormalization.16 + p2o.Relu.14
[01/09/2024-07:11:08] [I]       38.14       0.0662       0.0655       1.9   p2o.Conv.17 + p2o.BatchNormalization.17 + p2o.Add.8 + p2o.Relu.15
[01/09/2024-07:11:08] [I]       25.62       0.0445       0.0440       1.3   p2o.Conv.18 + p2o.BatchNormalization.18 + p2o.Relu.16
[01/09/2024-07:11:08] [I]       32.82       0.0570       0.0553       1.7   p2o.Conv.19 + p2o.BatchNormalization.19 + p2o.Relu.17
[01/09/2024-07:11:08] [I]       38.20       0.0663       0.0655       1.9   p2o.Conv.20 + p2o.BatchNormalization.20 + p2o.Add.10 + p2o.Relu.18
[01/09/2024-07:11:08] [I]       25.64       0.0445       0.0440       1.3   p2o.Conv.21 + p2o.BatchNormalization.21 + p2o.Relu.19
[01/09/2024-07:11:08] [I]       32.70       0.0568       0.0553       1.6   p2o.Conv.22 + p2o.BatchNormalization.22 + p2o.Relu.20
[01/09/2024-07:11:08] [I]       38.02       0.0660       0.0655       1.9   p2o.Conv.23 + p2o.BatchNormalization.23 + p2o.Add.12 + p2o.Relu.21
[01/09/2024-07:11:08] [I]       34.20       0.0594       0.0584       1.7   p2o.Conv.24 + p2o.BatchNormalization.24 + p2o.Relu.22
[01/09/2024-07:11:08] [I]       29.41       0.0511       0.0502       1.5   p2o.Conv.27 + p2o.BatchNormalization.27
[01/09/2024-07:11:08] [I]       41.25       0.0716       0.0707       2.1   p2o.Conv.25 + p2o.BatchNormalization.25 + p2o.Relu.23
[01/09/2024-07:11:08] [I]       23.62       0.0410       0.0399       1.2   p2o.Conv.26 + p2o.BatchNormalization.26 + p2o.Add.14 + p2o.Relu.24
[01/09/2024-07:11:08] [I]       19.77       0.0343       0.0338       1.0   p2o.Conv.28 + p2o.BatchNormalization.28 + p2o.Relu.25
[01/09/2024-07:11:08] [I]       34.77       0.0604       0.0594       1.8   p2o.Conv.29 + p2o.BatchNormalization.29 + p2o.Relu.26
[01/09/2024-07:11:08] [I]       22.41       0.0389       0.0379       1.1   p2o.Conv.30 + p2o.BatchNormalization.30 + p2o.Add.16 + p2o.Relu.27
[01/09/2024-07:11:08] [I]       19.66       0.0341       0.0338       1.0   p2o.Conv.31 + p2o.BatchNormalization.31 + p2o.Relu.28
[01/09/2024-07:11:08] [I]       34.41       0.0597       0.0584       1.7   p2o.Conv.32 + p2o.BatchNormalization.32 + p2o.Relu.29
[01/09/2024-07:11:08] [I]       22.78       0.0396       0.0389       1.1   p2o.Conv.33 + p2o.BatchNormalization.33 + p2o.Add.18 + p2o.Relu.30
[01/09/2024-07:11:08] [I]       19.70       0.0342       0.0338       1.0   p2o.Conv.34 + p2o.BatchNormalization.34 + p2o.Relu.31
[01/09/2024-07:11:08] [I]       34.47       0.0598       0.0584       1.7   p2o.Conv.35 + p2o.BatchNormalization.35 + p2o.Relu.32
[01/09/2024-07:11:08] [I]       22.32       0.0387       0.0379       1.1   p2o.Conv.36 + p2o.BatchNormalization.36 + p2o.Add.20 + p2o.Relu.33
[01/09/2024-07:11:08] [I]       19.64       0.0341       0.0338       1.0   p2o.Conv.37 + p2o.BatchNormalization.37 + p2o.Relu.34
[01/09/2024-07:11:08] [I]       34.51       0.0599       0.0584       1.7   p2o.Conv.38 + p2o.BatchNormalization.38 + p2o.Relu.35
[01/09/2024-07:11:08] [I]       22.79       0.0396       0.0389       1.1   p2o.Conv.39 + p2o.BatchNormalization.39 + p2o.Add.22 + p2o.Relu.36
[01/09/2024-07:11:08] [I]       19.62       0.0341       0.0338       1.0   p2o.Conv.40 + p2o.BatchNormalization.40 + p2o.Relu.37
[01/09/2024-07:11:08] [I]       34.29       0.0595       0.0584       1.7   p2o.Conv.41 + p2o.BatchNormalization.41 + p2o.Relu.38
[01/09/2024-07:11:08] [I]       22.34       0.0388       0.0379       1.1   p2o.Conv.42 + p2o.BatchNormalization.42 + p2o.Add.24 + p2o.Relu.39
[01/09/2024-07:11:08] [I]       28.79       0.0500       0.0492       1.5   p2o.Conv.43 + p2o.BatchNormalization.43 + p2o.Relu.40
[01/09/2024-07:11:08] [I]       29.90       0.0519       0.0512       1.5   p2o.Conv.46 + p2o.BatchNormalization.46
[01/09/2024-07:11:08] [I]       39.18       0.0680       0.0666       2.0   p2o.Conv.44 + p2o.BatchNormalization.44 + p2o.Relu.41
[01/09/2024-07:11:08] [I]       20.80       0.0361       0.0358       1.0   p2o.Conv.45 + p2o.BatchNormalization.45 + p2o.Add.26 + p2o.Relu.42
[01/09/2024-07:11:08] [I]       19.68       0.0342       0.0338       1.0   p2o.Conv.47 + p2o.BatchNormalization.47 + p2o.Relu.43
[01/09/2024-07:11:08] [I]       38.55       0.0669       0.0655       1.9   p2o.Conv.48 + p2o.BatchNormalization.48 + p2o.Relu.44
[01/09/2024-07:11:08] [I]       20.06       0.0348       0.0338       1.0   p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28
[01/09/2024-07:11:08] [I]       11.55       0.0200       0.0184       0.6   Reformatting CopyNode for Output Tensor 0 to p2o.Conv.49 + p2o.BatchNormalization.49 + p2o.Add.28
[01/09/2024-07:11:08] [I]     1983.58       3.4437       3.1969     100.0   Total

leo0519 avatar Jan 09 '24 08:01 leo0519

I found that even BF16 flag is set, the chosen kernels for convolution are still in FP32 precision.

Per @nvpohanh 's comment, maybe the FP32 conv kernels are faster than BF16? you can verified this by checking TRT verbose log, in the tactic optimizer part find the kernel name and see how each kernel's perf.

zerollzeng avatar Jan 11 '24 13:01 zerollzeng

closing since no activity for more than 3 weeks, thanks all!

ttyio avatar Mar 05 '24 17:03 ttyio