MIOpen icon indicating copy to clipboard operation
MIOpen copied to clipboard

Implement LogSumExp

Open long10024070 opened this issue 10 months ago • 0 comments

  • Added basic LogSumExp operation and kernel.
  • Added driver test and gtest for LogSumExp.
  • New API is guarded by MIOPEN_BETA_API macro.

When comparing the newly developed miopen LogSumExp kernel with ROCm, there's performance improvement for a specific range of input sizes.

  • FWD : input_numel < 1,000,000 & reduced elements(K) < 1024
  • BWD : 300 < input_numel < 1,000,000
Type Direction geomean
fp32 fwd 3.79
fp16 fwd 3.80
bfp16 fwd 3.95
fp32 bwd 2.21
fp16 bwd 2.09
bfp16 bwd 2.09

FWD - FP32
op_name dtype size direction Rocm kernel avg MIOpen kernel avg ROCm / MIOpen
Logsumexp float32 [8] fwd 46,624 18,240 2.56
Logsumexp float32 [16] fwd 34,368 23,733 1.45
Logsumexp float32 [32] fwd 35,007 17,760 1.97
Logsumexp float32 [64] fwd 36,720 19,378 1.89
Logsumexp float32 [128] fwd 37,679 21,138 1.78
Logsumexp float32 [256] fwd 34,608 26,115 1.33
Logsumexp float32 [512] fwd 35,568 32,782 1.08
Logsumexp float32 [20 20] fwd 73,871 12,889 5.73
Logsumexp float32 [20 40] fwd 63,920 10,418 6.14
Logsumexp float32 [20 80] fwd 60,976 9,671 6.31
Logsumexp float32 [20 160] fwd 72,048 9,387 7.68
Logsumexp float32 [20 320] fwd 79,279 9,404 8.43
Logsumexp float32 [40 40] fwd 69,599 10,880 6.40
Logsumexp float32 [40 80] fwd 72,431 10,329 7.01
Logsumexp float32 [40 160] fwd 74,143 9,316 7.96
Logsumexp float32 [80 40] fwd 89,439 13,031 6.86
Logsumexp float32 [80 250] fwd 72,592 11,271 6.44
Logsumexp float32 [30 300] fwd 54,064 18,329 2.95
Logsumexp float32 [60 300] fwd 54,176 16,996 3.19
Logsumexp float32 [120 300] fwd 52,720 16,409 3.21
Logsumexp float32 [240 300] fwd 55,951 16,729 3.34
Logsumexp float32 [100 200] fwd 81,455 11,182 7.28
Logsumexp float32 [300 400] fwd 55,679 20,355 2.74
Logsumexp float32 [900 1000] fwd 101,535 52,088 1.95
Logsumexp float32 [1000 500] fwd 93,391 43,857 2.13
Logsumexp float32 [40 50 30] fwd 82,239 11,396 7.22
Logsumexp float32 [30 50 10] fwd 62,080 29,475 2.11
Logsumexp float32 [50 40 50] fwd 69,711 12,426 5.61
Logsumexp float32 [40 60 10] fwd 68,815 20,089 3.43
Logsumexp float32 [50 30 5] fwd 67,951 16,942 4.01
Logsumexp float32 [100 200 10] fwd 91,791 16,764 5.48
Logsumexp float32 [100 200 20] fwd 104,575 25,902 4.04
Logsumexp float32 [100 100 50] fwd 97,631 36,497 2.68
Logsumexp float32 [200 20 20] fwd 101,823 14,720 6.92
Logsumexp float32 [200 300 2] fwd 69,535 20,355 3.42
Logsumexp float32 [300 30 30] fwd 119,327 22,453 5.31
Logsumexp float32 [50 10 5 20] fwd 79,231 10,773 7.35
Logsumexp float32 [50 30 5 4] fwd 73,999 18,738 3.95
Logsumexp float32 [5 100 10 3] fwd 64,208 14,702 4.37
Logsumexp float32 [10 20 10 5] fwd 58,384 9,493 6.15
Logsumexp float32 [20 30 5 5] fwd 120,206 11,147 10.78
Logsumexp float32 [20 5 10 10] fwd 73,295 41,173 1.78
Logsumexp float32 [30 10 5 10] fwd 53,327 23,147 2.30
Logsumexp float32 [10 2 10 2] fwd 43,599 29,440 1.48
Logsumexp float32 [100 10 10 10] fwd 91,631 13,867 6.61
Logsumexp float32 [100 20 10 10] fwd 75,871 35,857 2.12
Logsumexp float32 [100 100 10 5] fwd 156,206 43,964 3.55
Logsumexp float32 [100 10 50 10] fwd 91,679 32,035 2.86
Logsumexp float32 [100 4 400 2] fwd 128,367 24,018 5.34
Logsumexp float32 [100 5 20 30] fwd 148,751 25,884 5.75
Logsumexp float32 [30 10 5 10 2] fwd 80,975 10,631 7.62
Logsumexp float32 [30 5 20 5 4] fwd 68,783 18,347 3.75
Logsumexp float32 [20 10 3 12 4] fwd 104,527 10,009 10.44
Logsumexp float32 [20 5 10 5 10] fwd 86,671 12,658 6.85
Logsumexp float32 [40 2 3 2 5] fwd 128,511 22,773 5.64
Logsumexp float32 [15 3 5 20 12] fwd 106,719 16,515 6.46
Logsumexp float32 [12 12 4 2 10] fwd 56,128 38,897 1.44
Logsumexp float32 [5 5 5 2 10] fwd 131,375 22,862 5.75
Logsumexp float32 [2 4 8 2 4] fwd 44,896 31,342 1.43
Logsumexp float32 [100 10 10 5 10] fwd 90,495 32,106 2.82
Logsumexp float32 [100 20 10 2 10] fwd 73,135 65,137 1.12
Logsumexp float32 [10 50 40 10 2] fwd 149,038 26,488 5.63
Logsumexp float32 [40 20 50 10 2] fwd 95,055 45,831 2.07
Logsumexp float32 [60 10 50 10 2] fwd 89,583 37,280 2.40

FWD - FP16
op_name dtype size direction Rocm kernel avg MIOpen kernel avg ROCm / MIOpen
Logsumexp float16 [8] fwd 45,120 16,978 2.66
Logsumexp float16 [16] fwd 35,616 24,000 1.48
Logsumexp float16 [32] fwd 36,687 19,769 1.86
Logsumexp float16 [64] fwd 37,728 21,102 1.79
Logsumexp float16 [128] fwd 38,816 21,333 1.82
Logsumexp float16 [256] fwd 33,711 24,906 1.35
Logsumexp float16 [512] fwd 35,104 29,973 1.17
Logsumexp float16 [20 20] fwd 71,199 13,031 5.46
Logsumexp float16 [20 40] fwd 61,887 10,151 6.10
Logsumexp float16 [20 80] fwd 60,719 9,156 6.63
Logsumexp float16 [20 160] fwd 70,768 8,835 8.01
Logsumexp float16 [20 320] fwd 78,287 9,209 8.50
Logsumexp float16 [40 40] fwd 70,095 10,578 6.63
Logsumexp float16 [40 80] fwd 74,943 9,973 7.51
Logsumexp float16 [40 160] fwd 76,415 9,280 8.23
Logsumexp float16 [80 40] fwd 89,391 12,284 7.28
Logsumexp float16 [80 250] fwd 70,272 10,809 6.50
Logsumexp float16 [30 300] fwd 53,775 18,595 2.89
Logsumexp float16 [60 300] fwd 51,088 16,853 3.03
Logsumexp float16 [120 300] fwd 51,984 16,142 3.22
Logsumexp float16 [240 300] fwd 54,176 16,533 3.28
Logsumexp float16 [100 200] fwd 82,480 10,791 7.64
Logsumexp float16 [300 400] fwd 54,351 19,911 2.73
Logsumexp float16 [900 1000] fwd 99,695 52,124 1.91
Logsumexp float16 [1000 500] fwd 91,327 42,666 2.14
Logsumexp float16 [40 50 30] fwd 78,112 11,395 6.85
Logsumexp float16 [30 50 10] fwd 63,487 27,697 2.29
Logsumexp float16 [50 40 50] fwd 66,255 12,427 5.33
Logsumexp float16 [40 60 10] fwd 68,591 19,733 3.48
Logsumexp float16 [50 30 5] fwd 68,495 16,960 4.04
Logsumexp float16 [100 200 10] fwd 90,111 16,640 5.42
Logsumexp float16 [100 200 20] fwd 104,927 25,884 4.05
Logsumexp float16 [100 100 50] fwd 89,871 36,462 2.46
Logsumexp float16 [200 20 20] fwd 100,815 14,560 6.92
Logsumexp float16 [200 300 2] fwd 68,720 19,947 3.45
Logsumexp float16 [300 30 30] fwd 122,831 22,293 5.51
Logsumexp float16 [50 10 5 20] fwd 79,999 10,613 7.54
Logsumexp float16 [50 30 5 4] fwd 76,559 18,560 4.12
Logsumexp float16 [5 100 10 3] fwd 67,407 15,182 4.44
Logsumexp float16 [10 20 10 5] fwd 58,704 9,333 6.29
Logsumexp float16 [20 30 5 5] fwd 113,391 10,844 10.46
Logsumexp float16 [20 5 10 10] fwd 70,079 41,813 1.68
Logsumexp float16 [30 10 5 10] fwd 51,039 23,324 2.19
Logsumexp float16 [10 2 10 2] fwd 41,552 30,275 1.37
Logsumexp float16 [100 10 10 10] fwd 94,991 13,475 7.05
Logsumexp float16 [100 20 10 10] fwd 78,015 35,911 2.17
Logsumexp float16 [100 100 10 5] fwd 152,798 42,542 3.59
Logsumexp float16 [100 10 50 10] fwd 93,103 32,071 2.90
Logsumexp float16 [100 4 400 2] fwd 134,558 23,662 5.69
Logsumexp float16 [100 5 20 30] fwd 148,895 25,813 5.77
Logsumexp float16 [30 10 5 10 2] fwd 80,847 10,453 7.73
Logsumexp float16 [30 5 20 5 4] fwd 71,647 18,649 3.84
Logsumexp float16 [20 10 3 12 4] fwd 103,679 9,742 10.64
Logsumexp float16 [20 5 10 5 10] fwd 88,703 12,089 7.34
Logsumexp float16 [40 2 3 2 5] fwd 117,679 23,893 4.93
Logsumexp float16 [15 3 5 20 12] fwd 105,551 16,372 6.45
Logsumexp float16 [12 12 4 2 10] fwd 54,927 38,151 1.44
Logsumexp float16 [5 5 5 2 10] fwd 124,495 23,982 5.19
Logsumexp float16 [2 4 8 2 4] fwd 44,896 32,658 1.37
Logsumexp float16 [100 10 10 5 10] fwd 93,743 31,964 2.93
Logsumexp float16 [100 20 10 2 10] fwd 74,672 65,191 1.15
Logsumexp float16 [10 50 40 10 2] fwd 155,662 25,938 6.00
Logsumexp float16 [40 20 50 10 2] fwd 89,984 45,848 1.96
Logsumexp float16 [60 10 50 10 2] fwd 89,055 36,675 2.43

FWD - BFP16
op_name dtype size direction Rocm kernel avg MIOpen kernel avg ROCm / MIOpen
Logsumexp bfloat16 [8] fwd 42,639 18,791 2.27
Logsumexp bfloat16 [16] fwd 36,000 24,835 1.45
Logsumexp bfloat16 [32] fwd 37,584 19,644 1.91
Logsumexp bfloat16 [64] fwd 37,552 19,840 1.89
Logsumexp bfloat16 [128] fwd 39,120 19,609 2.00
Logsumexp bfloat16 [256] fwd 33,552 24,818 1.35
Logsumexp bfloat16 [512] fwd 35,248 30,933 1.14
Logsumexp bfloat16 [20 20] fwd 71,951 12,889 5.58
Logsumexp bfloat16 [20 40] fwd 63,232 10,258 6.16
Logsumexp bfloat16 [20 80] fwd 60,848 9,227 6.59
Logsumexp bfloat16 [20 160] fwd 71,712 8,889 8.07
Logsumexp bfloat16 [20 320] fwd 81,855 9,227 8.87
Logsumexp bfloat16 [40 40] fwd 70,719 10,507 6.73
Logsumexp bfloat16 [40 80] fwd 74,160 9,902 7.49
Logsumexp bfloat16 [40 160] fwd 79,279 9,440 8.40
Logsumexp bfloat16 [80 40] fwd 91,295 12,444 7.34
Logsumexp bfloat16 [80 250] fwd 74,751 10,631 7.03
Logsumexp bfloat16 [30 300] fwd 57,487 18,595 3.09
Logsumexp bfloat16 [60 300] fwd 55,312 16,675 3.32
Logsumexp bfloat16 [120 300] fwd 54,591 15,982 3.42
Logsumexp bfloat16 [240 300] fwd 58,896 16,604 3.55
Logsumexp bfloat16 [100 200] fwd 87,167 10,933 7.97
Logsumexp bfloat16 [300 400] fwd 58,751 19,911 2.95
Logsumexp bfloat16 [900 1000] fwd 106,383 52,017 2.05
Logsumexp bfloat16 [1000 500] fwd 99,039 42,400 2.34
Logsumexp bfloat16 [40 50 30] fwd 78,783 11,520 6.84
Logsumexp bfloat16 [30 50 10] fwd 65,055 28,960 2.25
Logsumexp bfloat16 [50 40 50] fwd 70,143 12,498 5.61
Logsumexp bfloat16 [40 60 10] fwd 69,631 19,662 3.54
Logsumexp bfloat16 [50 30 5] fwd 71,151 16,942 4.20
Logsumexp bfloat16 [100 200 10] fwd 93,439 16,782 5.57
Logsumexp bfloat16 [100 200 20] fwd 112,223 26,044 4.31
Logsumexp bfloat16 [100 100 50] fwd 92,191 36,782 2.51
Logsumexp bfloat16 [200 20 20] fwd 105,327 14,471 7.28
Logsumexp bfloat16 [200 300 2] fwd 71,839 19,858 3.62
Logsumexp bfloat16 [300 30 30] fwd 129,167 22,436 5.76
Logsumexp bfloat16 [50 10 5 20] fwd 83,519 10,596 7.88
Logsumexp bfloat16 [50 30 5 4] fwd 82,031 18,702 4.39
Logsumexp bfloat16 [5 100 10 3] fwd 68,384 15,004 4.56
Logsumexp bfloat16 [10 20 10 5] fwd 60,767 9,440 6.44
Logsumexp bfloat16 [20 30 5 5] fwd 116,079 10,720 10.83
Logsumexp bfloat16 [20 5 10 10] fwd 76,047 42,008 1.81
Logsumexp bfloat16 [30 10 5 10] fwd 55,567 23,467 2.37
Logsumexp bfloat16 [10 2 10 2] fwd 42,016 29,422 1.43
Logsumexp bfloat16 [100 10 10 10] fwd 100,719 13,600 7.41
Logsumexp bfloat16 [100 20 10 10] fwd 80,831 36,195 2.23
Logsumexp bfloat16 [100 100 10 5] fwd 158,591 42,542 3.73
Logsumexp bfloat16 [100 10 50 10] fwd 94,287 32,178 2.93
Logsumexp bfloat16 [100 4 400 2] fwd 136,559 23,662 5.77
Logsumexp bfloat16 [100 5 20 30] fwd 152,639 25,707 5.94
Logsumexp bfloat16 [30 10 5 10 2] fwd 82,239 10,560 7.79
Logsumexp bfloat16 [30 5 20 5 4] fwd 75,311 18,507 4.07
Logsumexp bfloat16 [20 10 3 12 4] fwd 110,047 9,920 11.09
Logsumexp bfloat16 [20 5 10 5 10] fwd 94,607 12,142 7.79
Logsumexp bfloat16 [40 2 3 2 5] fwd 123,199 23,751 5.19
Logsumexp bfloat16 [15 3 5 20 12] fwd 107,071 16,320 6.56
Logsumexp bfloat16 [12 12 4 2 10] fwd 58,064 38,507 1.51
Logsumexp bfloat16 [5 5 5 2 10] fwd 128,207 23,840 5.38
Logsumexp bfloat16 [2 4 8 2 4] fwd 46,623 29,760 1.57
Logsumexp bfloat16 [100 10 10 5 10] fwd 103,775 32,249 3.22
Logsumexp bfloat16 [100 20 10 2 10] fwd 80,879 65,955 1.23
Logsumexp bfloat16 [10 50 40 10 2] fwd 157,150 25,991 6.05
Logsumexp bfloat16 [40 20 50 10 2] fwd 99,455 45,759 2.17
Logsumexp bfloat16 [60 10 50 10 2] fwd 95,919 36,746 2.61

BWD - FP32
op_name dtype size direction Rocm kernel avg MIOpen kernel avg ROCm / MIOpen
Logsumexp float32 [512] bwd 10,320 9,866 1.05
Logsumexp float32 [20 20] bwd 23,616 9,404 2.51
Logsumexp float32 [20 40] bwd 22,736 9,547 2.38
Logsumexp float32 [20 80] bwd 25,504 9,138 2.79
Logsumexp float32 [20 160] bwd 25,104 9,546 2.63
Logsumexp float32 [20 320] bwd 23,184 8,942 2.59
Logsumexp float32 [40 40] bwd 24,928 9,191 2.71
Logsumexp float32 [40 80] bwd 25,248 9,102 2.77
Logsumexp float32 [40 160] bwd 23,712 8,889 2.67
Logsumexp float32 [80 40] bwd 25,312 9,369 2.70
Logsumexp float32 [80 250] bwd 21,232 8,604 2.47
Logsumexp float32 [30 300] bwd 21,776 9,102 2.39
Logsumexp float32 [60 300] bwd 22,480 8,462 2.66
Logsumexp float32 [120 300] bwd 22,048 8,533 2.58
Logsumexp float32 [240 300] bwd 21,504 8,587 2.50
Logsumexp float32 [100 200] bwd 24,032 8,729 2.75
Logsumexp float32 [300 400] bwd 22,640 9,795 2.31
Logsumexp float32 [900 1000] bwd 36,544 28,231 1.29
Logsumexp float32 [1000 500] bwd 26,512 17,991 1.47
Logsumexp float32 [40 50 30] bwd 22,480 8,640 2.60
Logsumexp float32 [30 50 10] bwd 23,024 8,836 2.61
Logsumexp float32 [50 40 50] bwd 23,232 8,640 2.69
Logsumexp float32 [40 60 10] bwd 24,000 8,444 2.84
Logsumexp float32 [50 30 5] bwd 22,720 9,102 2.50
Logsumexp float32 [100 200 10] bwd 24,112 10,133 2.38
Logsumexp float32 [100 200 20] bwd 26,000 15,484 1.68
Logsumexp float32 [100 100 50] bwd 27,280 18,009 1.51
Logsumexp float32 [200 20 20] bwd 22,576 8,658 2.61
Logsumexp float32 [200 300 2] bwd 23,696 9,849 2.41
Logsumexp float32 [300 30 30] bwd 24,288 12,836 1.89
Logsumexp float32 [50 10 5 20] bwd 22,240 8,444 2.63
Logsumexp float32 [50 30 5 4] bwd 24,656 8,658 2.85
Logsumexp float32 [5 100 10 3] bwd 24,208 8,711 2.78
Logsumexp float32 [10 20 10 5] bwd 22,736 8,782 2.59
Logsumexp float32 [20 30 5 5] bwd 22,608 8,604 2.63
Logsumexp float32 [20 5 10 10] bwd 21,904 8,907 2.46
Logsumexp float32 [30 10 5 10] bwd 21,200 8,729 2.43
Logsumexp float32 [10 2 10 2] bwd 10,320 9,582 1.08
Logsumexp float32 [100 10 10 10] bwd 23,056 8,533 2.70
Logsumexp float32 [100 20 10 10] bwd 24,544 10,187 2.41
Logsumexp float32 [100 100 10 5] bwd 29,216 18,169 1.61
Logsumexp float32 [100 10 50 10] bwd 26,672 18,044 1.48
Logsumexp float32 [100 4 400 2] bwd 25,520 14,062 1.81
Logsumexp float32 [100 5 20 30] bwd 24,672 13,316 1.85
Logsumexp float32 [30 10 5 10 2] bwd 24,656 8,498 2.90
Logsumexp float32 [30 5 20 5 4] bwd 22,960 8,533 2.69
Logsumexp float32 [20 10 3 12 4] bwd 24,000 8,569 2.80
Logsumexp float32 [20 5 10 5 10] bwd 21,776 8,516 2.56
Logsumexp float32 [40 2 3 2 5] bwd 25,872 9,689 2.67
Logsumexp float32 [15 3 5 20 12] bwd 22,640 8,462 2.68
Logsumexp float32 [12 12 4 2 10] bwd 23,040 8,729 2.64
Logsumexp float32 [5 5 5 2 10] bwd 24,768 9,724 2.55
Logsumexp float32 [2 4 8 2 4] bwd 10,896 8,889 1.23
Logsumexp float32 [100 10 10 5 10] bwd 26,688 17,955 1.49
Logsumexp float32 [100 20 10 2 10] bwd 26,479 15,466 1.71
Logsumexp float32 [10 50 40 10 2] bwd 25,792 15,449 1.67
Logsumexp float32 [40 20 50 10 2] bwd 32,320 25,511 1.27
Logsumexp float32 [60 10 50 10 2] bwd 28,656 20,658 1.39

BWD - FP16
op_name dtype size direction Rocm kernel avg MIOpen kernel avg ROCm / MIOpen
Logsumexp float16 [512] bwd 9,520 9,424 1.01
Logsumexp float16 [20 20] bwd 18,848 9,582 1.97
Logsumexp float16 [20 40] bwd 20,928 9,671 2.16
Logsumexp float16 [20 80] bwd 22,768 8,728 2.61
Logsumexp float16 [20 160] bwd 25,056 9,209 2.72
Logsumexp float16 [20 320] bwd 21,728 9,351 2.32
Logsumexp float16 [40 40] bwd 21,600 9,138 2.36
Logsumexp float16 [40 80] bwd 23,552 9,191 2.56
Logsumexp float16 [40 160] bwd 21,487 9,440 2.28
Logsumexp float16 [80 40] bwd 23,120 9,369 2.47
Logsumexp float16 [80 250] bwd 20,736 8,604 2.41
Logsumexp float16 [30 300] bwd 20,768 9,458 2.20
Logsumexp float16 [60 300] bwd 20,720 8,658 2.39
Logsumexp float16 [120 300] bwd 21,200 8,373 2.53
Logsumexp float16 [240 300] bwd 21,040 8,338 2.52
Logsumexp float16 [100 200] bwd 20,912 8,747 2.39
Logsumexp float16 [300 400] bwd 22,432 9,778 2.29
Logsumexp float16 [900 1000] bwd 32,655 28,373 1.15
Logsumexp float16 [1000 500] bwd 25,568 18,027 1.42
Logsumexp float16 [40 50 30] bwd 21,680 8,480 2.56
Logsumexp float16 [30 50 10] bwd 21,696 8,729 2.49
Logsumexp float16 [50 40 50] bwd 22,080 8,622 2.56
Logsumexp float16 [40 60 10] bwd 21,200 8,391 2.53
Logsumexp float16 [50 30 5] bwd 22,000 9,262 2.38
Logsumexp float16 [100 200 10] bwd 23,424 10,098 2.32
Logsumexp float16 [100 200 20] bwd 26,176 15,449 1.69
Logsumexp float16 [100 100 50] bwd 25,792 18,009 1.43
Logsumexp float16 [200 20 20] bwd 21,520 8,284 2.60
Logsumexp float16 [200 300 2] bwd 23,328 9,956 2.34
Logsumexp float16 [300 30 30] bwd 24,048 12,835 1.87
Logsumexp float16 [50 10 5 20] bwd 20,624 8,409 2.45
Logsumexp float16 [50 30 5 4] bwd 23,328 8,533 2.73
Logsumexp float16 [5 100 10 3] bwd 23,040 8,853 2.60
Logsumexp float16 [10 20 10 5] bwd 21,632 9,280 2.33
Logsumexp float16 [20 30 5 5] bwd 21,712 8,818 2.46
Logsumexp float16 [20 5 10 10] bwd 20,800 9,191 2.26
Logsumexp float16 [30 10 5 10] bwd 20,704 8,835 2.34
Logsumexp float16 [10 2 10 2] bwd 10,240 9,618 1.06
Logsumexp float16 [100 10 10 10] bwd 21,856 8,444 2.59
Logsumexp float16 [100 20 10 10] bwd 24,560 10,080 2.44
Logsumexp float16 [100 100 10 5] bwd 28,576 18,133 1.58
Logsumexp float16 [100 10 50 10] bwd 25,792 18,080 1.43
Logsumexp float16 [100 4 400 2] bwd 24,368 13,298 1.83
Logsumexp float16 [100 5 20 30] bwd 24,512 12,800 1.92
Logsumexp float16 [30 10 5 10 2] bwd 20,944 8,444 2.48
Logsumexp float16 [30 5 20 5 4] bwd 22,688 8,604 2.64
Logsumexp float16 [20 10 3 12 4] bwd 21,808 8,676 2.51
Logsumexp float16 [20 5 10 5 10] bwd 21,040 8,356 2.52
Logsumexp float16 [40 2 3 2 5] bwd 23,984 9,315 2.57
Logsumexp float16 [15 3 5 20 12] bwd 21,200 8,160 2.60
Logsumexp float16 [12 12 4 2 10] bwd 21,680 9,102 2.38
Logsumexp float16 [5 5 5 2 10] bwd 24,944 9,084 2.75
Logsumexp float16 [2 4 8 2 4] bwd 10,320 9,333 1.11
Logsumexp float16 [100 10 10 5 10] bwd 26,000 18,062 1.44
Logsumexp float16 [100 20 10 2 10] bwd 26,208 15,466 1.69
Logsumexp float16 [10 50 40 10 2] bwd 25,104 15,413 1.63
Logsumexp float16 [40 20 50 10 2] bwd 28,880 25,813 1.12
Logsumexp float16 [60 10 50 10 2] bwd 26,928 20,604 1.31

BWD - BFP16
op_name dtype size direction Rocm kernel avg MIOpen kernel avg ROCm / MIOpen
Logsumexp bfloat16 [512] bwd 10,176 9,547 1.07
Logsumexp bfloat16 [20 20] bwd 19,792 10,151 1.95
Logsumexp bfloat16 [20 40] bwd 20,304 9,884 2.05
Logsumexp bfloat16 [20 80] bwd 23,663 9,102 2.60
Logsumexp bfloat16 [20 160] bwd 23,312 9,707 2.40
Logsumexp bfloat16 [20 320] bwd 22,672 9,404 2.41
Logsumexp bfloat16 [40 40] bwd 22,352 9,547 2.34
Logsumexp bfloat16 [40 80] bwd 22,944 9,511 2.41
Logsumexp bfloat16 [40 160] bwd 22,432 9,564 2.35
Logsumexp bfloat16 [80 40] bwd 22,736 9,635 2.36
Logsumexp bfloat16 [80 250] bwd 21,376 8,818 2.42
Logsumexp bfloat16 [30 300] bwd 21,248 9,440 2.25
Logsumexp bfloat16 [60 300] bwd 20,912 8,960 2.33
Logsumexp bfloat16 [120 300] bwd 21,216 8,569 2.48
Logsumexp bfloat16 [240 300] bwd 21,600 8,551 2.53
Logsumexp bfloat16 [100 200] bwd 20,896 8,658 2.41
Logsumexp bfloat16 [300 400] bwd 22,560 9,938 2.27
Logsumexp bfloat16 [900 1000] bwd 34,912 28,853 1.21
Logsumexp bfloat16 [1000 500] bwd 26,720 18,311 1.46
Logsumexp bfloat16 [40 50 30] bwd 21,600 8,675 2.49
Logsumexp bfloat16 [30 50 10] bwd 21,584 8,800 2.45
Logsumexp bfloat16 [50 40 50] bwd 22,416 8,480 2.64
Logsumexp bfloat16 [40 60 10] bwd 23,248 8,747 2.66
Logsumexp bfloat16 [50 30 5] bwd 21,808 9,582 2.28
Logsumexp bfloat16 [100 200 10] bwd 23,824 10,258 2.32
Logsumexp bfloat16 [100 200 20] bwd 27,071 15,698 1.72
Logsumexp bfloat16 [100 100 50] bwd 26,880 18,293 1.47
Logsumexp bfloat16 [200 20 20] bwd 21,456 8,462 2.54
Logsumexp bfloat16 [200 300 2] bwd 23,312 9,884 2.36
Logsumexp bfloat16 [300 30 30] bwd 24,592 12,942 1.90
Logsumexp bfloat16 [50 10 5 20] bwd 20,944 8,587 2.44
Logsumexp bfloat16 [50 30 5 4] bwd 23,136 8,960 2.58
Logsumexp bfloat16 [5 100 10 3] bwd 23,184 9,173 2.53
Logsumexp bfloat16 [10 20 10 5] bwd 23,632 9,084 2.60
Logsumexp bfloat16 [20 30 5 5] bwd 22,128 8,995 2.46
Logsumexp bfloat16 [20 5 10 10] bwd 21,744 9,440 2.30
Logsumexp bfloat16 [30 10 5 10] bwd 21,120 8,996 2.35
Logsumexp bfloat16 [10 2 10 2] bwd 10,544 10,222 1.03
Logsumexp bfloat16 [100 10 10 10] bwd 22,192 8,427 2.63
Logsumexp bfloat16 [100 20 10 10] bwd 24,752 10,311 2.40
Logsumexp bfloat16 [100 100 10 5] bwd 29,312 18,418 1.59
Logsumexp bfloat16 [100 10 50 10] bwd 27,263 18,258 1.49
Logsumexp bfloat16 [100 4 400 2] bwd 24,960 13,475 1.85
Logsumexp bfloat16 [100 5 20 30] bwd 24,704 13,013 1.90
Logsumexp bfloat16 [30 10 5 10 2] bwd 21,152 8,569 2.47
Logsumexp bfloat16 [30 5 20 5 4] bwd 22,976 8,640 2.66
Logsumexp bfloat16 [20 10 3 12 4] bwd 21,552 8,533 2.53
Logsumexp bfloat16 [20 5 10 5 10] bwd 21,456 8,640 2.48
Logsumexp bfloat16 [40 2 3 2 5] bwd 22,240 9,404 2.36
Logsumexp bfloat16 [15 3 5 20 12] bwd 20,848 8,338 2.50
Logsumexp bfloat16 [12 12 4 2 10] bwd 21,216 8,871 2.39
Logsumexp bfloat16 [5 5 5 2 10] bwd 22,576 9,209 2.45
Logsumexp bfloat16 [2 4 8 2 4] bwd 10,864 9,707 1.12
Logsumexp bfloat16 [100 10 10 5 10] bwd 27,456 18,275 1.50
Logsumexp bfloat16 [100 20 10 2 10] bwd 27,440 15,573 1.76
Logsumexp bfloat16 [10 50 40 10 2] bwd 25,872 15,697 1.65
Logsumexp bfloat16 [40 20 50 10 2] bwd 30,384 26,169 1.16
Logsumexp bfloat16 [60 10 50 10 2] bwd 28,272 20,960 1.35

long10024070 avatar Feb 18 '25 06:02 long10024070