MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement LPPool

Open hieule88 opened this issue 10 months ago • 0 comments

  • Added LPPool 1D 2D forward and backward.

  • Added driver test and gtest for LPPool.

  • New API is guarded by MIOPEN_BETA_API macro.

  • Average over all cases:

LPPool1D

Type Forward Backward
float16 2.35 4.12
float32 2.37 3.51
bfloat16 2.40 4.17

LPPool1D FP16
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool1d float16 [16 672 32] 2 2 1 contiguous fwd 58000 28675 2.022667829
LPPool1d float16 [16 672 32] 2 2 1 contiguous bwd 101184 36888 2.743005856
LPPool1d float16 [16 672 32] 2 2 1 noncontiguous fwd 69536 29955 2.32134869
LPPool1d float16 [16 672 32] 2 2 1 noncontiguous bwd 139455 38150 3.655439056
LPPool1d float16 [16 960 32] 2 2 1 contiguous fwd 62048 37812 1.640960542
LPPool1d float16 [16 960 32] 2 2 1 contiguous bwd 108479 51519 2.105611522
LPPool1d float16 [16 960 32] 2 2 1 noncontiguous fwd 75680 39217 1.929775353
LPPool1d float16 [16 960 32] 2 2 1 noncontiguous bwd 155887 52621 2.962448452
LPPool1d float16 [3 2048 64] 2 2 1 contiguous fwd 58944 32177 1.831867483
LPPool1d float16 [3 2048 64] 2 2 1 contiguous bwd 102911 41528 2.478111154
LPPool1d float16 [3 2048 64] 2 2 1 noncontiguous fwd 73840 33475 2.205825243
LPPool1d float16 [3 2048 64] 2 2 1 noncontiguous bwd 151647 43696 3.470500732
LPPool1d float16 [64 2208 7] 2 2 1 contiguous fwd 75408 62665 1.203351153
LPPool1d float16 [64 2208 7] 2 2 1 contiguous bwd 130208 95785 1.359377773
LPPool1d float16 [64 2208 7] 2 2 1 noncontiguous fwd 108944 66683 1.633759729
LPPool1d float16 [64 2208 7] 2 2 1 noncontiguous bwd 222895 99625 2.237340025

LPPool1D FP32
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool1d float32 [16 672 32] 2 2 1 contiguous fwd 59728 28870 2.068860409
LPPool1d float32 [16 672 32] 2 2 1 contiguous bwd 103023 37386 2.755657198
LPPool1d float32 [16 672 32] 2 2 1 noncontiguous fwd 74128 32195 2.302469328
LPPool1d float32 [16 672 32] 2 2 1 noncontiguous bwd 146720 41013 3.577402287
LPPool1d float32 [16 960 32] 2 2 1 contiguous fwd 66448 37653 1.764746501
LPPool1d float32 [16 960 32] 2 2 1 contiguous bwd 115967 50826 2.281647188
LPPool1d float32 [16 960 32] 2 2 1 noncontiguous fwd 84736 42595 1.989341472
LPPool1d float32 [16 960 32] 2 2 1 noncontiguous bwd 170623 54328 3.140608894
LPPool1d float32 [3 2048 64] 2 2 1 contiguous fwd 61136 32266 1.894749892
LPPool1d float32 [3 2048 64] 2 2 1 contiguous bwd 106160 41368 2.566234771
LPPool1d float32 [3 2048 64] 2 2 1 noncontiguous fwd 77904 33617 2.317398935
LPPool1d float32 [3 2048 64] 2 2 1 noncontiguous bwd 157215 44000 3.573068182
LPPool1d float32 [64 2208 7] 2 2 1 contiguous fwd 87072 62666 1.38946159
LPPool1d float32 [64 2208 7] 2 2 1 contiguous bwd 163584 95608 1.710986528
LPPool1d float32 [64 2208 7] 2 2 1 noncontiguous fwd 119376 69137 1.726658663
LPPool1d float32 [64 2208 7] 2 2 1 noncontiguous bwd 264847 100800 2.627450397

LPPool1D BFP16
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool1d bfloat16 [16 672 32] 2 2 1 contiguous fwd 60016 28871 2.078764158
LPPool1d bfloat16 [16 672 32] 2 2 1 contiguous bwd 108479 36924 2.937899469
LPPool1d bfloat16 [16 672 32] 2 2 1 noncontiguous fwd 71936 29937 2.402912784
LPPool1d bfloat16 [16 672 32] 2 2 1 noncontiguous bwd 144095 38524 3.740395598
LPPool1d bfloat16 [16 960 32] 2 2 1 contiguous fwd 64448 37902 1.700385204
LPPool1d bfloat16 [16 960 32] 2 2 1 contiguous bwd 114207 51022 2.238387362
LPPool1d bfloat16 [16 960 32] 2 2 1 noncontiguous fwd 77840 39217 1.984853507
LPPool1d bfloat16 [16 960 32] 2 2 1 noncontiguous bwd 159198 52853 3.012090137
LPPool1d bfloat16 [3 2048 64] 2 2 1 contiguous fwd 60816 32302 1.882731719
LPPool1d bfloat16 [3 2048 64] 2 2 1 contiguous bwd 107967 41671 2.590938542
LPPool1d bfloat16 [3 2048 64] 2 2 1 noncontiguous fwd 75615 33635 2.248104653
LPPool1d bfloat16 [3 2048 64] 2 2 1 noncontiguous bwd 157103 43591 3.604023766
LPPool1d bfloat16 [64 2208 7] 2 2 1 contiguous fwd 77359 62826 1.231321427
LPPool1d bfloat16 [64 2208 7] 2 2 1 contiguous bwd 138367 96213 1.438132061
LPPool1d bfloat16 [64 2208 7] 2 2 1 noncontiguous fwd 109663 66755 1.642768332

LPPool 2D

Type Forward Backward
float16 1.25 1.47
float32 1.36 1.68
bfloat16 1.35 1.56

LPPool2D FP16
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool2d float16 [256 256 6 6] 2 [1 1] [1 1] noncontiguous fwd 210288 136549 1.540018601
LPPool2d float16 [256 256 6 6] 2 [1 1] [1 1] noncontiguous bwd 456046 164619 2.770312054
LPPool2d float16 [16 72 64 64] 2 [3 3] [1 1] noncontiguous fwd 482078 422766 1.140295104
LPPool2d float16 [16 72 64 64] 2 [3 3] [1 1] noncontiguous bwd 2646486 598016 4.425443466
LPPool2d float16 [16 120 64 64] 2 [2 2] [1 1] noncontiguous fwd 880077 578603 1.521037741
LPPool2d float16 [16 120 64 64] 2 [2 2] [1 1] noncontiguous bwd 3723649 1044300 3.565688978
LPPool2d float16 [16 480 32 32] 2 [4 4] [1 1] noncontiguous fwd 840396 730067 1.151121746
LPPool2d float16 [16 480 32 32] 2 [4 4] [1 1] noncontiguous bwd 2259911 1167730 1.935302681

LPPool2D FP32
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool2d float32 [256 256 6 6] 2 [1 1] [1 1] noncontiguous fwd 249567 136692 1.825761566
LPPool2d float32 [256 256 6 6] 2 [1 1] [1 1] noncontiguous bwd 627934 168639 3.723539632
LPPool2d float32 [16 72 64 64] 2 [3 3] [1 1] noncontiguous fwd 806060 646911 1.246013748
LPPool2d float32 [16 72 64 64] 2 [3 3] [1 1] noncontiguous bwd 1934392 1134820 1.704580462
LPPool2d float32 [16 120 64 64] 2 [2 2] [1 1] noncontiguous fwd 1337402 891175 1.500717592
LPPool2d float32 [16 120 64 64] 2 [2 2] [1 1] noncontiguous bwd 3362195 1787430 1.881021914
LPPool2d float32 [16 480 32 32] 2 [4 4] [1 1] noncontiguous fwd 1313339 1198500 1.09581894
LPPool2d float32 [16 480 32 32] 2 [4 4] [1 1] noncontiguous bwd 3545698 3306250 1.072422836

LPPool2D BFP16
op_name dtype input_size p kernel_size stride contiguous direction rocm_kernel_avg MIOpen MIOpen_over_rocm
LPPool2d bfloat16 [256 256 6 6] 2 [1 1] [1 1] noncontiguous fwd 221390 137280 1.612689394
LPPool2d bfloat16 [256 256 6 6] 2 [1 1] [1 1] noncontiguous bwd 490428 166275 2.949499323
LPPool2d bfloat16 [16 72 64 64] 2 [3 3] [1 1] noncontiguous fwd 502860 424123 1.185646617
LPPool2d bfloat16 [16 72 64 64] 2 [3 3] [1 1] noncontiguous bwd 2771849 599962 4.620040936
LPPool2d bfloat16 [16 120 64 64] 2 [2 2] [1 1] noncontiguous fwd 915832 580460 1.577769355
LPPool2d bfloat16 [16 120 64 64] 2 [2 2] [1 1] noncontiguous bwd 3039143 1033949 2.939354842
LPPool2d bfloat16 [16 480 32 32] 2 [4 4] [1 1] noncontiguous fwd 871049 731553 1.190684749
LPPool2d bfloat16 [16 480 32 32] 2 [4 4] [1 1] noncontiguous bwd 3307300 1170540 2.825448084

hieule88 avatar Mar 11 '25 09:03 hieule88