MIOpen
MIOpen copied to clipboard
Implement CartesianProd
-
Added CartesianProd forward and backward.
-
Added driver test and gtest for CartesianProd .
-
New API is guarded by MIOPEN_BETA_API macro.
-
Average over all cases:
-
CartesianProd
| Type | Forward | Backward |
|---|---|---|
| float16 | 1.37 | 1.33 |
| float32 | 1.60 | 1.29 |
FWD - FP16
| op_name | dtype | size | contiguous | direction | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|---|---|
| cartesian_prod | float16 | [100]_[50] | contiguous | fwd | 22016 | 33991 | 1.923332647 |
| cartesian_prod | float16 | [100]_[50] | noncontiguous | fwd | 21904 | 66045 | 0.964917859 |
| cartesian_prod | float16 | [100]_[100] | contiguous | fwd | 22544 | 53661 | 1.195337396 |
| cartesian_prod | float16 | [100]_[100] | noncontiguous | fwd | 22400 | 62175 | 1.062790511 |
| cartesian_prod | float16 | [100]_[300] | contiguous | fwd | 23616 | 52933 | 1.302760093 |
| cartesian_prod | float16 | [100]_[300] | noncontiguous | fwd | 23504 | 63278 | 1.089019881 |
| cartesian_prod | float16 | [300]_[100] | contiguous | fwd | 23280 | 65962 | 1.088369061 |
| cartesian_prod | float16 | [300]_[100] | noncontiguous | fwd | 23328 | 61572 | 1.103098811 |
| cartesian_prod | float16 | [10][2][2] | contiguous | fwd | 20592 | 46357 | 1.873783895 |
| cartesian_prod | float16 | [10][2][2] | noncontiguous | fwd | 20304 | 59919 | 1.278776348 |
| cartesian_prod | float16 | [4][3][10] | contiguous | fwd | 20256 | 63261 | 1.231200898 |
| cartesian_prod | float16 | [4][3][10] | noncontiguous | fwd | 20384 | 69056 | 1.163794601 |
| cartesian_prod | float16 | [5][10][2] | contiguous | fwd | 20448 | 70088 | 1.105795571 |
| cartesian_prod | float16 | [5][10][2] | noncontiguous | fwd | 20224 | 66497 | 1.153721221 |
| cartesian_prod | float16 | [10][3][4] | contiguous | fwd | 20528 | 64382 | 1.186651549 |
| cartesian_prod | float16 | [10][3][4] | noncontiguous | fwd | 20384 | 60667 | 1.259844726 |
| cartesian_prod | float16 | [10][10][10] | contiguous | fwd | 26352 | 72684 | 1.061677948 |
| cartesian_prod | float16 | [10][10][10] | noncontiguous | fwd | 26240 | 41612 | 1.910194175 |
| cartesian_prod | float16 | [10][10][30] | contiguous | fwd | 27296 | 72382 | 1.097275566 |
| cartesian_prod | float16 | [10][10][30] | noncontiguous | fwd | 27184 | 76115 | 1.01760494 |
| cartesian_prod | float16 | [10][30][10] | contiguous | fwd | 27216 | 81839 | 0.960312321 |
| cartesian_prod | float16 | [10][30][10] | noncontiguous | fwd | 27424 | 67316 | 1.176065126 |
| cartesian_prod | float16 | [30][10][10] | contiguous | fwd | 27280 | 71085 | 1.086009707 |
| cartesian_prod | float16 | [30][10][10] | noncontiguous | fwd | 27440 | 69397 | 1.134573541 |
| cartesian_prod | float16 | [30][30][30] | contiguous | fwd | 28944 | 67886 | 1.412470907 |
| cartesian_prod | float16 | [30][30][30] | noncontiguous | fwd | 28832 | 42235 | 2.245317864 |
FWD - FP32
| op_name | dtype | size | contiguous | direction | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|---|---|
| cartesian_prod | float32 | [100]_[50] | contiguous | fwd | 22976 | 56551 | 1.138494456 |
| cartesian_prod | float32 | [100]_[50] | noncontiguous | fwd | 22784 | 73675 | 0.896247031 |
| cartesian_prod | float32 | [100]_[100] | contiguous | fwd | 23088 | 58016 | 1.107004964 |
| cartesian_prod | float32 | [100]_[100] | noncontiguous | fwd | 23039 | 74689 | 0.899931717 |
| cartesian_prod | float32 | [100]_[300] | contiguous | fwd | 24048 | 41059 | 1.691590151 |
| cartesian_prod | float32 | [100]_[300] | noncontiguous | fwd | 24224 | 50676 | 1.375621596 |
| cartesian_prod | float32 | [300]_[100] | contiguous | fwd | 24208 | 64789 | 1.110296501 |
| cartesian_prod | float32 | [300]_[100] | noncontiguous | fwd | 23920 | 68362 | 1.018562944 |
| cartesian_prod | float32 | [10][2][2] | contiguous | fwd | 20400 | 46339 | 1.690476704 |
| cartesian_prod | float32 | [10][2][2] | noncontiguous | fwd | 21360 | 43406 | 1.801018292 |
| cartesian_prod | float32 | [4][3][10] | contiguous | fwd | 20752 | 54498 | 1.417428163 |
| cartesian_prod | float32 | [4][3][10] | noncontiguous | fwd | 22320 | 55085 | 1.437560134 |
| cartesian_prod | float32 | [5][10][2] | contiguous | fwd | 21648 | 97231 | 0.805329576 |
| cartesian_prod | float32 | [5][10][2] | noncontiguous | fwd | 22592 | 68879 | 1.113125916 |
| cartesian_prod | float32 | [10][3][4] | contiguous | fwd | 21744 | 69413 | 1.099505856 |
| cartesian_prod | float32 | [10][3][4] | noncontiguous | fwd | 22624 | 37506 | 2.034847758 |
| cartesian_prod | float32 | [10][10][10] | contiguous | fwd | 26624 | 73733 | 1.044186457 |
| cartesian_prod | float32 | [10][10][10] | noncontiguous | fwd | 26880 | 72062 | 1.075060365 |
| cartesian_prod | float32 | [10][10][30] | contiguous | fwd | 27440 | 77413 | 1.031545089 |
| cartesian_prod | float32 | [10][10][30] | noncontiguous | fwd | 27376 | 379210 | 0.205308404 |
| cartesian_prod | float32 | [10][30][10] | contiguous | fwd | 27536 | 68134 | 1.141735404 |
| cartesian_prod | float32 | [10][30][10] | noncontiguous | fwd | 27328 | 73342 | 1.07898612 |
| cartesian_prod | float32 | [30][10][10] | contiguous | fwd | 27408 | 51301 | 1.548507826 |
| cartesian_prod | float32 | [30][10][10] | noncontiguous | fwd | 27440 | 43888 | 1.801289646 |
| cartesian_prod | float32 | [30][30][30] | contiguous | fwd | 29168 | 80916 | 1.190555638 |
| cartesian_prod | float32 | [30][30][30] | noncontiguous | fwd | 29520 | 69628 | 1.397354513 |
BWD - FP16
| op_name | dtype | size | contiguous | direction | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|---|---|
| cartesian_prod | float16 | [10]_[2] | contiguous | bwd | 35200 | 41618 | 1.372459032 |
| cartesian_prod | float16 | [10]_[2] | noncontiguous | bwd | 25904 | 43609 | 1.216629595 |
| cartesian_prod | float16 | [2]_[10] | contiguous | bwd | 25408 | 30169 | 1.796811296 |
| cartesian_prod | float16 | [2]_[10] | noncontiguous | bwd | 29968 | 37867 | 1.426017377 |
| cartesian_prod | float16 | [10]_[10] | contiguous | bwd | 26000 | 47698 | 1.066375949 |
| cartesian_prod | float16 | [10]_[10] | noncontiguous | bwd | 24064 | 47218 | 1.095514422 |
BWD - FP32
| op_name | dtype | size | contiguous | direction | rocm_kernel_avg | kernel_duration | improvement over rocm |
|---|---|---|---|---|---|---|---|
| cartesian_prod | float32 | [10]_[2] | contiguous | bwd | 33696 | 47858 | 1.120648585 |
| cartesian_prod | float32 | [10]_[2] | noncontiguous | bwd | 27568 | 42898 | 1.203202947 |
| cartesian_prod | float32 | [2]_[10] | contiguous | bwd | 25936 | 53902 | 0.975974917 |
| cartesian_prod | float32 | [2]_[10] | noncontiguous | bwd | 28688 | 49956 | 1.173192409 |
| cartesian_prod | float32 | [10]_[10] | contiguous | bwd | 25968 | 42933 | 1.155288473 |
| cartesian_prod | float32 | [10]_[10] | noncontiguous | bwd | 23968 | 42560 | 1.233458647 |