mixbench icon indicating copy to clipboard operation
mixbench copied to clipboard

huge performance drop after some FLOPS/byte point

Open edisonchan opened this issue 2 years ago • 3 comments

I have try to build and run mixbench-ocl on Snapdragon 8 Gen2, its GPU is Adreno.

Total global   mem:    7629 MB
--
Max allowed buffer:  1024 MB
OpenCL version:      OpenCL 3.0 Adreno(TM) 740
Total CUs:           6
-----------------------------------------------------------------------
Buffer size:            256MB
Workgroup size:         256
Elements per workitem:  8
Workitem fusion degree: 4
Workitem stride:        NDRange
Buffer allocation:      Device allocated
Timer:                  CL event based
Warning:                Double precision computations   are not supported
Loading kernel source file...
Precompilation of kernels...   [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]

image


----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------
Experiment ID, Single Precision ops,,,,              Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
            0,      0.250,    2.41,   13.95,  55.79,      0.125,    0.00,     inf,    inf,      0.500,    2.39,   28.10,  56.21,     0.250,    2.43,   13.81,  55.25
            1,      0.750,    2.41,   41.82,  55.76,      0.375,    0.00,     inf,    inf,      1.500,    2.38,   84.63,  56.42,     0.750,    2.43,   41.51,  55.35
            2,      1.250,    2.39,   70.08,  56.07,      0.625,    0.00,     inf,    inf,      2.500,    2.38,  140.88,  56.35,     1.250,    2.43,   69.13,  55.30
            3,      1.750,    2.39,   98.11,  56.06,      0.875,    0.00,     inf,    inf,      3.500,    2.36,  198.70,  56.77,     1.750,    2.40,   97.74,  55.85
            4,      2.250,    2.38,  126.99,  56.44,      1.125,    0.00,     inf,    inf,      4.500,    2.35,  256.47,  56.99,     2.250,    2.40,  125.67,  55.85
            5,      2.750,    2.41,  153.02,  55.65,      1.375,    0.00,     inf,    inf,      5.500,    2.36,  313.06,  56.92,     2.750,    2.40,  153.86,  55.95
            6,      3.250,    2.38,  183.44,  56.44,      1.625,    0.00,     inf,    inf,      6.500,    2.37,  368.58,  56.70,     3.250,    2.43,  179.21,  55.14
            7,      3.750,    2.41,  208.94,  55.72,      1.875,    0.00,     inf,    inf,      7.500,    2.40,  419.61,  55.95,     3.750,    2.41,  209.02,  55.74
            8,      4.250,    2.38,  239.18,  56.28,      2.125,    0.00,     inf,    inf,      8.500,    2.35,  485.03,  57.06,     4.250,    2.40,  237.47,  55.88
            9,      4.750,    2.37,  269.11,  56.66,      2.375,    0.00,     inf,    inf,      9.500,    2.35,  543.27,  57.19,     4.750,    2.40,  266.06,  56.01
           10,      5.250,    2.36,  298.05,  56.77,      2.625,    0.00,     inf,    inf,     10.500,    2.34,  601.25,  57.26,     5.250,    2.40,  293.48,  55.90
           11,      5.750,    2.37,  325.63,  56.63,      2.875,    0.00,     inf,    inf,     11.500,    2.35,  657.36,  57.16,     5.750,    2.40,  320.91,  55.81
           12,      6.250,    2.37,  354.25,  56.68,      3.125,    0.00,     inf,    inf,     12.500,    3.94,  425.39,  34.03,     6.250,    2.40,  349.67,  55.95
           13,      6.750,    2.36,  383.09,  56.75,      3.375,    0.00,     inf,    inf,     13.500,    4.23,  428.55,  31.74,     6.750,    2.40,  376.88,  55.83
           14,      7.250,    2.36,  411.82,  56.80,      3.625,    0.00,     inf,    inf,     14.500,    4.53,  429.72,  29.64,     7.250,    2.41,  403.94,  55.72
           15,      7.750,    2.37,  439.65,  56.73,      3.875,    0.00,     inf,    inf,     15.500,    4.81,  432.70,  27.92,     7.750,    2.44,  425.78,  54.94
           16,      8.250,    2.36,  468.37,  56.77,      4.125,    0.00,     inf,    inf,     16.500,    5.11,  433.56,  26.28,     8.250,    2.53,  437.30,  53.01
           17,      8.750,    2.36,  496.81,  56.78,      4.375,    0.00,     inf,    inf,     17.500,    5.39,  435.60,  24.89,     8.750,    2.64,  445.04,  50.86
           18,      9.250,    2.36,  525.14,  56.77,      4.625,    0.00,     inf,    inf,     18.500,    5.69,  436.40,  23.59,     9.250,    2.73,  455.11,  49.20
           20,     10.250,    2.36,  581.97,  56.78,      5.125,    0.00,     inf,    inf,     20.500,    6.27,  438.90,  21.41,    10.250,    2.95,  466.98,  45.56
           22,     11.250,    2.36,  639.51,  56.85,      5.625,    0.00,     inf,    inf,     22.500,    6.85,  440.81,  19.59,    11.250,    3.19,  472.88,  42.03
           24,     12.250,    2.36,  696.74,  56.88,      6.125,    0.00,     inf,    inf,     24.500,   12.12,  271.22,  11.07,    12.250,    3.45,  477.12,  38.95
           28,     14.250,    2.36,  810.49,  56.88,      7.125,    0.00,     inf,    inf,     28.500,   14.01,  272.98,   9.58,    14.250,    3.95,  483.94,  33.96
           32,     16.250,    2.36,  922.64,  56.78,      8.125,    0.00,     inf,    inf,     32.500,   15.90,  274.33,   8.44,    16.250,    4.46,  488.71,  30.07
           40,     20.250,    2.37, 1148.26,  56.70,     10.125,    0.00,     inf,    inf,     40.500,   19.68,  276.22,   6.82,    20.250,    5.49,  495.26,  24.46
           48,     24.250,    2.38, 1369.75,  56.48,     12.125,    0.00,     inf,    inf,     48.500,   23.46,  277.49,   5.72,    24.250,    6.51,  499.82,  20.61
           56,     28.250,    2.37, 1597.06,  56.53,     14.125,    0.00,     inf,    inf,     56.500,   27.23,  278.46,   4.93,    28.250,    7.54,  502.81,  17.80
           64,     32.250,   36.46,  118.70,   3.68,     16.125,    0.00,     inf,    inf,     64.500,   31.02,  279.10,   4.33,    32.250,   41.67,  103.89,   3.22
           80,     40.250,   42.93,  125.84,   3.13,     20.125,    0.00,     inf,    inf,     80.500,   38.58,  280.03,   3.48,    40.250,   49.35,  109.47,   2.72
           96,     48.250,   49.30,  131.36,   2.72,     24.125,    0.00,     inf,    inf,     96.500,   46.15,  280.68,   2.91,    48.250,   57.18,  113.26,   2.35
          128,     64.250,   62.33,  138.34,   2.15,     32.125,    0.00,     inf,    inf,    128.500,   61.26,  281.52,   2.19,    64.250,   72.67,  118.67,   1.85
          192,     96.250,   88.14,  146.57,   1.52,     48.125,    0.00,     inf,    inf,    192.500,   91.53,  282.29,   1.47,    96.250,  106.47,  121.34,   1.26
          256,    128.250,  117.05,  147.06,   1.15,     64.125,    0.00,     inf,    inf,    256.500,  121.77,  282.73,   1.10,   128.250,  137.89,  124.84,   0.97
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

What reason cause this "problem"?

edisonchan avatar Nov 07 '23 06:11 edisonchan

This can happen. One possibility could be potentially register spilling occurring

If you have time, you could experiment by manually controlling the unroll factor of the loop. For example, you could add a #pragma unroll 16 directive before line: https://github.com/ekondis/mixbench/blob/e51e1962f6a8f6664ebf28df0cb40a508e1fe3b5/mixbench-opencl/mix_kernels.cl#L36

ekondis avatar Jan 05 '24 20:01 ekondis

This can happen. One possibility could be potentially register spilling occurring

If you have time, you could experiment by manually controlling the unroll factor of the loop. For example, you could add a #pragma unroll 16 directive before line:

https://github.com/ekondis/mixbench/blob/e51e1962f6a8f6664ebf28df0cb40a508e1fe3b5/mixbench-opencl/mix_kernels.cl#L36

I have try, 16 is not enough here, 128 maybe the best number, but still have a huge drop after 128 Compute iters:

LD_LIBRARY_PATH=/data/data/com.termux/files/usr/lib:/system/vendor/lib64 ./mixbench-ocl
mixbench-ocl (v0.04-13-g597b700)
Use "-h" argument to see available options
------------------------ Device specifications ------------------------
Platform:            QUALCOMM Snapdragon(TM)
Device:              QUALCOMM Adreno(TM) 750/QUALCOMM
Driver version:      OpenCL 3.0 QUALCOMM build: commit unknown Compiler E031.45.02.11
Address bits:        64
GPU clock rate:      1 MHz
Total global mem:    7631 MB
Max allowed buffer:  1024 MB
OpenCL version:      OpenCL 3.0 Adreno(TM) 750
Total CUs:           6
-----------------------------------------------------------------------
Buffer size:            256MB
Workgroup size:         256
Elements per workitem:  8
Workitem fusion degree: 4
Workitem stride:        NDRange
Buffer allocation:      Device allocated
Timer:                  CL event based
Warning:                Double precision computations are not supported
Loading kernel source file...
Precompilation of kernels... [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>]
----------------------------------------------------------------------------- CSV data -----------------------------------------------------------------------------
Experiment ID, Single Precision ops,,,,              Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
            0,      0.250,    2.23,   15.06,  60.24,      0.125,    0.00,     inf,    inf,      0.500,    2.22,   30.20,  60.40,     0.250,    2.23,   15.03,  60.14
            1,      0.750,    2.22,   45.33,  60.44,      0.375,    0.00,     inf,    inf,      1.500,    2.22,   90.57,  60.38,     0.750,    2.23,   45.12,  60.16
            2,      1.250,    2.22,   75.55,  60.44,      0.625,    0.00,     inf,    inf,      2.500,    2.22,  151.02,  60.41,     1.250,    2.23,   75.34,  60.27
            3,      1.750,    2.21,  106.08,  60.62,      0.875,    0.00,     inf,    inf,      3.500,    2.22,  211.87,  60.53,     1.750,    2.24,  104.95,  59.97
            4,      2.250,    2.21,  136.72,  60.77,      1.125,    0.00,     inf,    inf,      4.500,    2.21,  272.69,  60.60,     2.250,    2.23,  135.54,  60.24
            5,      2.750,    2.22,  166.34,  60.49,      1.375,    0.00,     inf,    inf,      5.500,    2.21,  333.44,  60.63,     2.750,    2.24,  164.48,  59.81
            6,      3.250,    2.22,  196.06,  60.33,      1.625,    0.00,     inf,    inf,      6.500,    2.22,  392.97,  60.46,     3.250,    2.24,  194.56,  59.86
            7,      3.750,    2.22,  226.74,  60.46,      1.875,    0.00,     inf,    inf,      7.500,    2.22,  453.43,  60.46,     3.750,    2.24,  224.49,  59.86
            8,      4.250,    2.21,  257.75,  60.65,      2.125,    0.00,     inf,    inf,      8.500,    2.22,  514.36,  60.51,     4.250,    2.24,  254.54,  59.89
            9,      4.750,    2.22,  287.67,  60.56,      2.375,    0.00,     inf,    inf,      9.500,    2.22,  574.88,  60.51,     4.750,    2.24,  284.74,  59.95
           10,      5.250,    2.22,  317.55,  60.49,      2.625,    0.00,     inf,    inf,     10.500,    2.22,  634.80,  60.46,     5.250,    2.24,  314.72,  59.95
           11,      5.750,    2.22,  347.95,  60.51,      2.875,    0.00,     inf,    inf,     11.500,    2.22,  695.90,  60.51,     5.750,    2.25,  343.59,  59.75
           12,      6.250,    2.22,  378.56,  60.57,      3.125,    0.00,     inf,    inf,     12.500,    2.22,  756.77,  60.54,     6.250,    2.24,  373.98,  59.84
           13,      6.750,    2.22,  408.28,  60.49,      3.375,    0.00,     inf,    inf,     13.500,    2.21,  819.86,  60.73,     6.750,    2.25,  403.39,  59.76
           14,      7.250,    2.22,  437.71,  60.37,      3.625,    0.00,     inf,    inf,     14.500,    2.22,  876.63,  60.46,     7.250,    2.24,  434.21,  59.89
           15,      7.750,    2.22,  468.82,  60.49,      3.875,    0.00,     inf,    inf,     15.500,    2.21,  940.02,  60.65,     7.750,    2.24,  463.52,  59.81
           16,      8.250,    2.21,  499.93,  60.60,      4.125,    0.00,     inf,    inf,     16.500,    2.21,  999.86,  60.60,     8.250,    2.24,  495.01,  60.00
           17,      8.750,    2.22,  529.98,  60.57,      4.375,    0.00,     inf,    inf,     17.500,    2.21, 1061.43,  60.65,     8.750,    2.23,  526.15,  60.13
           18,      9.250,    2.21,  560.98,  60.65,      4.625,    0.00,     inf,    inf,     18.500,    2.21, 1122.48,  60.67,     9.250,    2.24,  553.49,  59.84
           20,     10.250,    2.25,  611.44,  59.65,      5.125,    0.00,     inf,    inf,     20.500,    2.22, 1238.24,  60.40,    10.250,    2.28,  602.33,  58.76
           22,     11.250,    2.22,  681.40,  60.57,      5.625,    0.00,     inf,    inf,     22.500,    2.23, 1354.21,  60.19,    11.250,    2.43,  621.13,  55.21
           24,     12.250,    2.21,  743.35,  60.68,      6.125,    0.00,     inf,    inf,     24.500,    2.20, 1492.05,  60.90,    12.250,    2.60,  631.64,  51.56
           28,     14.250,    2.21,  866.21,  60.79,      7.125,    0.00,     inf,    inf,     28.500,    2.20, 1737.26,  60.96,    14.250,    2.98,  641.85,  45.04
           32,     16.250,    2.21,  984.71,  60.60,      8.125,    0.00,     inf,    inf,     32.500,    2.21, 1971.92,  60.67,    16.250,    3.36,  648.53,  39.91
           40,     20.250,    2.21, 1230.94,  60.79,     10.125,    0.00,     inf,    inf,     40.500,    2.20, 2466.45,  60.90,    20.250,    4.13,  657.63,  32.48
           48,     24.250,    2.22, 1467.45,  60.51,     12.125,    0.00,     inf,    inf,     48.500,    2.22, 2934.90,  60.51,    24.250,    4.90,  663.57,  27.36
           56,     28.250,    2.22, 1708.71,  60.49,     14.125,    0.00,     inf,    inf,     56.500,    2.32, 3265.60,  57.80,    28.250,    5.68,  667.65,  23.63
           64,     32.250,    2.22, 1947.96,  60.40,     16.125,    0.00,     inf,    inf,     64.500,    2.54, 3409.62,  52.86,    32.250,    6.45,  670.67,  20.80
           80,     40.250,    2.25, 2399.93,  59.63,     20.125,    0.00,     inf,    inf,     80.500,   49.72,  217.32,   2.70,    40.250,    8.01,  674.70,  16.76
           96,     48.250,    2.56, 2531.71,  52.47,     24.125,    0.00,     inf,    inf,     96.500,   59.54,  217.55,   2.25,    48.250,    9.56,  677.62,  14.04
          128,     64.250,    3.32, 2597.59,  40.43,     32.125,    0.00,     inf,    inf,    128.500,   79.17,  217.85,   1.70,    64.250,   12.66,  681.16,  10.60
          192,     96.250,   52.40,  246.51,   2.56,     48.125,    0.00,     inf,    inf,    192.500,  118.44,  218.14,   1.13,    96.250,   72.07,  179.24,   1.86
          256,    128.250,   69.67,  247.06,   1.93,     64.125,    0.00,     inf,    inf,    256.500,  157.72,  218.28,   0.85,   128.250,   95.91,  179.48,   1.40
--------------------------------------------------------------------------------------------------------------------------------------------------------------------

edisonchan avatar Oct 19 '24 02:10 edisonchan

So, this is device dependent. Maybe, for the OpenCL implementation this could be exposed as a parameter to the benchmark to provide more flexibility to the user. I'm not sure if it's worth though. In addition, as you see the threshold of compute iters after which the performance drops varies not only on the device but on the type of data (128 for SP float vs 64 for HP float).

ekondis avatar Oct 24 '24 05:10 ekondis