MIOpen
MIOpen copied to clipboard
Implement LPPool
-
Added driver test and gtest for LPPool.
-
New API is guarded by MIOPEN_BETA_API macro.
-
Average over all cases:
LPPool1D
| Type | Forward | Backward |
|---|---|---|
| float16 | 2.35 | 4.12 |
| float32 | 2.37 | 3.51 |
| bfloat16 | 2.40 | 4.17 |
LPPool1D FP16
| op_name | dtype | input_size | p | kernel_size | stride | contiguous | direction | rocm_kernel_avg | MIOpen | MIOpen_over_rocm |
|---|---|---|---|---|---|---|---|---|---|---|
| LPPool1d | float16 | [16 672 32] | 2 | 2 | 1 | contiguous | fwd | 58000 | 28675 | 2.022667829 |
| LPPool1d | float16 | [16 672 32] | 2 | 2 | 1 | contiguous | bwd | 101184 | 36888 | 2.743005856 |
| LPPool1d | float16 | [16 672 32] | 2 | 2 | 1 | noncontiguous | fwd | 69536 | 29955 | 2.32134869 |
| LPPool1d | float16 | [16 672 32] | 2 | 2 | 1 | noncontiguous | bwd | 139455 | 38150 | 3.655439056 |
| LPPool1d | float16 | [16 960 32] | 2 | 2 | 1 | contiguous | fwd | 62048 | 37812 | 1.640960542 |
| LPPool1d | float16 | [16 960 32] | 2 | 2 | 1 | contiguous | bwd | 108479 | 51519 | 2.105611522 |
| LPPool1d | float16 | [16 960 32] | 2 | 2 | 1 | noncontiguous | fwd | 75680 | 39217 | 1.929775353 |
| LPPool1d | float16 | [16 960 32] | 2 | 2 | 1 | noncontiguous | bwd | 155887 | 52621 | 2.962448452 |
| LPPool1d | float16 | [3 2048 64] | 2 | 2 | 1 | contiguous | fwd | 58944 | 32177 | 1.831867483 |
| LPPool1d | float16 | [3 2048 64] | 2 | 2 | 1 | contiguous | bwd | 102911 | 41528 | 2.478111154 |
| LPPool1d | float16 | [3 2048 64] | 2 | 2 | 1 | noncontiguous | fwd | 73840 | 33475 | 2.205825243 |
| LPPool1d | float16 | [3 2048 64] | 2 | 2 | 1 | noncontiguous | bwd | 151647 | 43696 | 3.470500732 |
| LPPool1d | float16 | [64 2208 7] | 2 | 2 | 1 | contiguous | fwd | 75408 | 62665 | 1.203351153 |
| LPPool1d | float16 | [64 2208 7] | 2 | 2 | 1 | contiguous | bwd | 130208 | 95785 | 1.359377773 |
| LPPool1d | float16 | [64 2208 7] | 2 | 2 | 1 | noncontiguous | fwd | 108944 | 66683 | 1.633759729 |
| LPPool1d | float16 | [64 2208 7] | 2 | 2 | 1 | noncontiguous | bwd | 222895 | 99625 | 2.237340025 |
LPPool1D FP32
| op_name | dtype | input_size | p | kernel_size | stride | contiguous | direction | rocm_kernel_avg | MIOpen | MIOpen_over_rocm |
|---|---|---|---|---|---|---|---|---|---|---|
| LPPool1d | float32 | [16 672 32] | 2 | 2 | 1 | contiguous | fwd | 59728 | 28870 | 2.068860409 |
| LPPool1d | float32 | [16 672 32] | 2 | 2 | 1 | contiguous | bwd | 103023 | 37386 | 2.755657198 |
| LPPool1d | float32 | [16 672 32] | 2 | 2 | 1 | noncontiguous | fwd | 74128 | 32195 | 2.302469328 |
| LPPool1d | float32 | [16 672 32] | 2 | 2 | 1 | noncontiguous | bwd | 146720 | 41013 | 3.577402287 |
| LPPool1d | float32 | [16 960 32] | 2 | 2 | 1 | contiguous | fwd | 66448 | 37653 | 1.764746501 |
| LPPool1d | float32 | [16 960 32] | 2 | 2 | 1 | contiguous | bwd | 115967 | 50826 | 2.281647188 |
| LPPool1d | float32 | [16 960 32] | 2 | 2 | 1 | noncontiguous | fwd | 84736 | 42595 | 1.989341472 |
| LPPool1d | float32 | [16 960 32] | 2 | 2 | 1 | noncontiguous | bwd | 170623 | 54328 | 3.140608894 |
| LPPool1d | float32 | [3 2048 64] | 2 | 2 | 1 | contiguous | fwd | 61136 | 32266 | 1.894749892 |
| LPPool1d | float32 | [3 2048 64] | 2 | 2 | 1 | contiguous | bwd | 106160 | 41368 | 2.566234771 |
| LPPool1d | float32 | [3 2048 64] | 2 | 2 | 1 | noncontiguous | fwd | 77904 | 33617 | 2.317398935 |
| LPPool1d | float32 | [3 2048 64] | 2 | 2 | 1 | noncontiguous | bwd | 157215 | 44000 | 3.573068182 |
| LPPool1d | float32 | [64 2208 7] | 2 | 2 | 1 | contiguous | fwd | 87072 | 62666 | 1.38946159 |
| LPPool1d | float32 | [64 2208 7] | 2 | 2 | 1 | contiguous | bwd | 163584 | 95608 | 1.710986528 |
| LPPool1d | float32 | [64 2208 7] | 2 | 2 | 1 | noncontiguous | fwd | 119376 | 69137 | 1.726658663 |
| LPPool1d | float32 | [64 2208 7] | 2 | 2 | 1 | noncontiguous | bwd | 264847 | 100800 | 2.627450397 |
LPPool1D BFP16
| op_name | dtype | input_size | p | kernel_size | stride | contiguous | direction | rocm_kernel_avg | MIOpen | MIOpen_over_rocm |
|---|---|---|---|---|---|---|---|---|---|---|
| LPPool1d | bfloat16 | [16 672 32] | 2 | 2 | 1 | contiguous | fwd | 60016 | 28871 | 2.078764158 |
| LPPool1d | bfloat16 | [16 672 32] | 2 | 2 | 1 | contiguous | bwd | 108479 | 36924 | 2.937899469 |
| LPPool1d | bfloat16 | [16 672 32] | 2 | 2 | 1 | noncontiguous | fwd | 71936 | 29937 | 2.402912784 |
| LPPool1d | bfloat16 | [16 672 32] | 2 | 2 | 1 | noncontiguous | bwd | 144095 | 38524 | 3.740395598 |
| LPPool1d | bfloat16 | [16 960 32] | 2 | 2 | 1 | contiguous | fwd | 64448 | 37902 | 1.700385204 |
| LPPool1d | bfloat16 | [16 960 32] | 2 | 2 | 1 | contiguous | bwd | 114207 | 51022 | 2.238387362 |
| LPPool1d | bfloat16 | [16 960 32] | 2 | 2 | 1 | noncontiguous | fwd | 77840 | 39217 | 1.984853507 |
| LPPool1d | bfloat16 | [16 960 32] | 2 | 2 | 1 | noncontiguous | bwd | 159198 | 52853 | 3.012090137 |
| LPPool1d | bfloat16 | [3 2048 64] | 2 | 2 | 1 | contiguous | fwd | 60816 | 32302 | 1.882731719 |
| LPPool1d | bfloat16 | [3 2048 64] | 2 | 2 | 1 | contiguous | bwd | 107967 | 41671 | 2.590938542 |
| LPPool1d | bfloat16 | [3 2048 64] | 2 | 2 | 1 | noncontiguous | fwd | 75615 | 33635 | 2.248104653 |
| LPPool1d | bfloat16 | [3 2048 64] | 2 | 2 | 1 | noncontiguous | bwd | 157103 | 43591 | 3.604023766 |
| LPPool1d | bfloat16 | [64 2208 7] | 2 | 2 | 1 | contiguous | fwd | 77359 | 62826 | 1.231321427 |
| LPPool1d | bfloat16 | [64 2208 7] | 2 | 2 | 1 | contiguous | bwd | 138367 | 96213 | 1.438132061 |
| LPPool1d | bfloat16 | [64 2208 7] | 2 | 2 | 1 | noncontiguous | fwd | 109663 | 66755 | 1.642768332 |
LPPool 2D
| Type | Forward | Backward |
|---|---|---|
| float16 | 1.25 | 1.47 |
| float32 | 1.36 | 1.68 |
| bfloat16 | 1.35 | 1.56 |
LPPool2D FP16
| op_name | dtype | input_size | p | kernel_size | stride | contiguous | direction | rocm_kernel_avg | MIOpen | MIOpen_over_rocm |
|---|---|---|---|---|---|---|---|---|---|---|
| LPPool2d | float16 | [256 256 6 6] | 2 | [1 1] | [1 1] | noncontiguous | fwd | 210288 | 136549 | 1.540018601 |
| LPPool2d | float16 | [256 256 6 6] | 2 | [1 1] | [1 1] | noncontiguous | bwd | 456046 | 164619 | 2.770312054 |
| LPPool2d | float16 | [16 72 64 64] | 2 | [3 3] | [1 1] | noncontiguous | fwd | 482078 | 422766 | 1.140295104 |
| LPPool2d | float16 | [16 72 64 64] | 2 | [3 3] | [1 1] | noncontiguous | bwd | 2646486 | 598016 | 4.425443466 |
| LPPool2d | float16 | [16 120 64 64] | 2 | [2 2] | [1 1] | noncontiguous | fwd | 880077 | 578603 | 1.521037741 |
| LPPool2d | float16 | [16 120 64 64] | 2 | [2 2] | [1 1] | noncontiguous | bwd | 3723649 | 1044300 | 3.565688978 |
| LPPool2d | float16 | [16 480 32 32] | 2 | [4 4] | [1 1] | noncontiguous | fwd | 840396 | 730067 | 1.151121746 |
| LPPool2d | float16 | [16 480 32 32] | 2 | [4 4] | [1 1] | noncontiguous | bwd | 2259911 | 1167730 | 1.935302681 |
LPPool2D FP32
| op_name | dtype | input_size | p | kernel_size | stride | contiguous | direction | rocm_kernel_avg | MIOpen | MIOpen_over_rocm |
|---|---|---|---|---|---|---|---|---|---|---|
| LPPool2d | float32 | [256 256 6 6] | 2 | [1 1] | [1 1] | noncontiguous | fwd | 249567 | 136692 | 1.825761566 |
| LPPool2d | float32 | [256 256 6 6] | 2 | [1 1] | [1 1] | noncontiguous | bwd | 627934 | 168639 | 3.723539632 |
| LPPool2d | float32 | [16 72 64 64] | 2 | [3 3] | [1 1] | noncontiguous | fwd | 806060 | 646911 | 1.246013748 |
| LPPool2d | float32 | [16 72 64 64] | 2 | [3 3] | [1 1] | noncontiguous | bwd | 1934392 | 1134820 | 1.704580462 |
| LPPool2d | float32 | [16 120 64 64] | 2 | [2 2] | [1 1] | noncontiguous | fwd | 1337402 | 891175 | 1.500717592 |
| LPPool2d | float32 | [16 120 64 64] | 2 | [2 2] | [1 1] | noncontiguous | bwd | 3362195 | 1787430 | 1.881021914 |
| LPPool2d | float32 | [16 480 32 32] | 2 | [4 4] | [1 1] | noncontiguous | fwd | 1313339 | 1198500 | 1.09581894 |
| LPPool2d | float32 | [16 480 32 32] | 2 | [4 4] | [1 1] | noncontiguous | bwd | 3545698 | 3306250 | 1.072422836 |
LPPool2D BFP16
| op_name | dtype | input_size | p | kernel_size | stride | contiguous | direction | rocm_kernel_avg | MIOpen | MIOpen_over_rocm |
|---|---|---|---|---|---|---|---|---|---|---|
| LPPool2d | bfloat16 | [256 256 6 6] | 2 | [1 1] | [1 1] | noncontiguous | fwd | 221390 | 137280 | 1.612689394 |
| LPPool2d | bfloat16 | [256 256 6 6] | 2 | [1 1] | [1 1] | noncontiguous | bwd | 490428 | 166275 | 2.949499323 |
| LPPool2d | bfloat16 | [16 72 64 64] | 2 | [3 3] | [1 1] | noncontiguous | fwd | 502860 | 424123 | 1.185646617 |
| LPPool2d | bfloat16 | [16 72 64 64] | 2 | [3 3] | [1 1] | noncontiguous | bwd | 2771849 | 599962 | 4.620040936 |
| LPPool2d | bfloat16 | [16 120 64 64] | 2 | [2 2] | [1 1] | noncontiguous | fwd | 915832 | 580460 | 1.577769355 |
| LPPool2d | bfloat16 | [16 120 64 64] | 2 | [2 2] | [1 1] | noncontiguous | bwd | 3039143 | 1033949 | 2.939354842 |
| LPPool2d | bfloat16 | [16 480 32 32] | 2 | [4 4] | [1 1] | noncontiguous | fwd | 871049 | 731553 | 1.190684749 |
| LPPool2d | bfloat16 | [16 480 32 32] | 2 | [4 4] | [1 1] | noncontiguous | bwd | 3307300 | 1170540 | 2.825448084 |