jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF

Open Hamlin-Li opened this issue 10 months ago • 28 comments

Hi, Can you help to review the patch? This pr is based on previous work and discussion in pr 16234, pr 18294.

Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check src/jdk.incubator.vector/linux/native/libvectormath/README), rather than depends on external sleef things (header or lib) at build or run time. Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.

Besides of the code changes, one important task is to handle the legal process.

Thanks!

Performance

Options

  • +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
  • -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'

Float

data

Benchmark (size) Mode Cnt Error Units Score +intrinsic (UseSVE=1) Score -intrinsic Improvement(UseSVE=1) Score +intrinsic (UseSVE=0) Score -intrinsic Improvement (UseSVE=0)
Float128Vector.ACOS 1024 thrpt 10 0.015 ops/ms 245.439 101.483 2.419 245.733 102.033 2.408
Float128Vector.ASIN 1024 thrpt 10 0.013 ops/ms 296.702 103.559 2.865 296.741 103.18 2.876
Float128Vector.ATAN 1024 thrpt 10 0.004 ops/ms 196.862 49.627 3.967 195.891 49.771 3.936
Float128Vector.ATAN2 1024 thrpt 10 0.021 ops/ms 135.088 32.449 4.163 135.721 32.579 4.166
Float128Vector.CBRT 1024 thrpt 10 0.004 ops/ms 114.547 39.517 2.899 114.756 39.273 2.922
Float128Vector.COS 1024 thrpt 10 0.006 ops/ms 93.226 62.883 1.483 93.195 63.116 1.477
Float128Vector.COSH 1024 thrpt 10 0.005 ops/ms 154.498 76.58 2.017 154.147 77.026 2.001
Float128Vector.EXP 1024 thrpt 10 0.248 ops/ms 483.569 83.614 5.783 502.786 83.424 6.027
Float128Vector.EXPM1 1024 thrpt 10 0.01 ops/ms 156.338 62.091 2.518 157.589 62.008 2.541
Float128Vector.HYPOT 1024 thrpt 10 0.007 ops/ms 191.217 56.834 3.364 191.247 58.624 3.262
Float128Vector.LOG 1024 thrpt 10 0.019 ops/ms 258.223 52.005 4.965 259.642 52.018 4.991
Float128Vector.LOG10 1024 thrpt 10 0.004 ops/ms 238.916 43.311 5.516 240.135 43.352 5.539
Float128Vector.LOG1P 1024 thrpt 10 0.112 ops/ms 246.507 42.227 5.838 246.546 42.24 5.837
Float128Vector.POW 1024 thrpt 10 0.033 ops/ms 73.78 25.17 2.931 73.693 25.113 2.934
Float128Vector.SIN 1024 thrpt 10 0.004 ops/ms 95.509 62.807 1.521 95.792 62.883 1.523
Float128Vector.SINH 1024 thrpt 10 0.011 ops/ms 153.177 77.586 1.974 152.97 77.248 1.98
Float128Vector.TAN 1024 thrpt 10 0.002 ops/ms 74.394 32.662 2.278 74.491 32.639 2.282
Float128Vector.TANH 1024 thrpt 10 0.005 ops/ms 129.308 144.581 0.894 129.319 144.916 0.892
Float256Vector.ACOS 1024 thrpt 10 0.311 ops/ms 378.109 135.118 2.798 122.381 123.502 0.991
Float256Vector.ASIN 1024 thrpt 10 1.039 ops/ms 452.692 135.067 3.352 126.037 123.53 1.02
Float256Vector.ATAN 1024 thrpt 10 0.017 ops/ms 288.785 62.032 4.655 59.783 59.821 0.999
Float256Vector.ATAN2 1024 thrpt 10 0.065 ops/ms 217.573 40.843 5.327 38.337 38.352 1
Float256Vector.CBRT 1024 thrpt 10 0.042 ops/ms 185.721 49.353 3.763 46.273 46.279 1
Float256Vector.COS 1024 thrpt 10 0.036 ops/ms 163.584 78.947 2.072 70.544 70.74 0.997
Float256Vector.COSH 1024 thrpt 10 0.01 ops/ms 211.746 96.885 2.186 84.078 84.366 0.997
Float256Vector.EXP 1024 thrpt 10 0.121 ops/ms 954.69 117.145 8.15 97.97 97.713 1.003
Float256Vector.EXPM1 1024 thrpt 10 0.055 ops/ms 213.462 79.832 2.674 74.292 74.36 0.999
Float256Vector.HYPOT 1024 thrpt 10 0.052 ops/ms 306.511 74.208 4.13 68.856 69.077 0.997
Float256Vector.LOG 1024 thrpt 10 0.216 ops/ms 406.914 65.408 6.221 59.808 59.767 1.001
Float256Vector.LOG10 1024 thrpt 10 0.37 ops/ms 371.385 53.156 6.987 49.334 49.171 1.003
Float256Vector.LOG1P 1024 thrpt 10 1.851 ops/ms 397.247 52.042 7.633 50.181 50.199 1
Float256Vector.POW 1024 thrpt 10 0.048 ops/ms 115.155 27.174 4.238 24.659 24.703 0.998
Float256Vector.SIN 1024 thrpt 10 0.107 ops/ms 154.975 79.103 1.959 70.9 70.615 1.004
Float256Vector.SINH 1024 thrpt 10 0.351 ops/ms 202.683 97.643 2.076 84.587 84.371 1.003
Float256Vector.TAN 1024 thrpt 10 0.005 ops/ms 127.597 37.136 3.436 34.774 34.757 1
Float256Vector.TANH 1024 thrpt 10 1.233 ops/ms 249.084 247.272 1.007 169.903 169.805 1.001
Float512Vector.ACOS 1024 thrpt 10 0.069 ops/ms 148.467 152.264 0.975 150.131 154.717 0.97
Float512Vector.ASIN 1024 thrpt 10 0.287 ops/ms 147.144 158.074 0.931 147.251 148.71 0.99
Float512Vector.ATAN 1024 thrpt 10 0.101 ops/ms 68.498 67.987 1.008 67.968 68.131 0.998
Float512Vector.ATAN2 1024 thrpt 10 0.016 ops/ms 44.189 44.052 1.003 43.898 43.781 1.003
Float512Vector.CBRT 1024 thrpt 10 0.012 ops/ms 53.514 53.672 0.997 53.623 53.635 1
Float512Vector.COS 1024 thrpt 10 0.222 ops/ms 80.566 80.713 0.998 80.672 80.796 0.998
Float512Vector.COSH 1024 thrpt 10 0.104 ops/ms 102.175 102.038 1.001 102.303 102.009 1.003
Float512Vector.EXP 1024 thrpt 10 0.255 ops/ms 118.824 118.942 0.999 118.551 118.976 0.996
Float512Vector.EXPM1 1024 thrpt 10 0.021 ops/ms 87.363 87.153 1.002 87.842 87.387 1.005
Float512Vector.HYPOT 1024 thrpt 10 0.048 ops/ms 86.838 86.439 1.005 86.903 86.709 1.002
Float512Vector.LOG 1024 thrpt 10 0.017 ops/ms 70.794 70.746 1.001 70.469 70.62 0.998
Float512Vector.LOG10 1024 thrpt 10 0.051 ops/ms 55.821 55.85 0.999 55.883 55.773 1.002
Float512Vector.LOG1P 1024 thrpt 10 0.085 ops/ms 57.113 57.582 0.992 56.942 57.245 0.995
Float512Vector.POW 1024 thrpt 10 0.006 ops/ms 26.66 26.656 1 26.651 26.641 1
Float512Vector.SIN 1024 thrpt 10 0.067 ops/ms 80.873 80.806 1.001 80.638 80.456 1.002
Float512Vector.SINH 1024 thrpt 10 0.16 ops/ms 103.818 102.766 1.01 102.669 103.83 0.989
Float512Vector.TAN 1024 thrpt 10 0.148 ops/ms 38.107 37.971 1.004 37.938 37.862 1.002
Float512Vector.TANH 1024 thrpt 10 1.206 ops/ms 237.573 235.876 1.007 236.684 236.724 1
Float64Vector.ACOS 1024 thrpt 10 0.006 ops/ms 123.038 64.939 1.895 123.07 65.556 1.877
Float64Vector.ASIN 1024 thrpt 10 0.006 ops/ms 148.56 65.115 2.282 148.576 66.468 2.235
Float64Vector.ATAN 1024 thrpt 10 0.003 ops/ms 98.512 40.569 2.428 98.458 40.932 2.405
Float64Vector.ATAN2 1024 thrpt 10 0.004 ops/ms 67.706 24.824 2.727 68.214 25.157 2.712
Float64Vector.CBRT 1024 thrpt 10 0.001 ops/ms 57.299 29.725 1.928 57.343 29.279 1.959
Float64Vector.COS 1024 thrpt 10 0.008 ops/ms 46.689 44.153 1.057 46.67 43.683 1.068
Float64Vector.COSH 1024 thrpt 10 0.005 ops/ms 77.552 51.012 1.52 77.66 51.285 1.514
Float64Vector.EXP 1024 thrpt 10 0.257 ops/ms 242.736 54.277 4.472 248.345 54.298 4.574
Float64Vector.EXPM1 1024 thrpt 10 0.003 ops/ms 78.741 45.22 1.741 79.082 45.396 1.742
Float64Vector.HYPOT 1024 thrpt 10 0.002 ops/ms 95.716 36.135 2.649 95.702 36.424 2.627
Float64Vector.LOG 1024 thrpt 10 0.006 ops/ms 130.395 38.954 3.347 130.321 38.99 3.342
Float64Vector.LOG10 1024 thrpt 10 0.003 ops/ms 119.783 33.912 3.532 120.254 33.951 3.542
Float64Vector.LOG1P 1024 thrpt 10 0.006 ops/ms 123.966 34.381 3.606 123.984 34.291 3.616
Float64Vector.POW 1024 thrpt 10 0.003 ops/ms 36.872 21.747 1.695 36.774 21.639 1.699
Float64Vector.SIN 1024 thrpt 10 0.002 ops/ms 48.008 44.076 1.089 48.001 43.989 1.091
Float64Vector.SINH 1024 thrpt 10 0.004 ops/ms 76.711 50.893 1.507 76.936 51.236 1.502
Float64Vector.TAN 1024 thrpt 10 0.006 ops/ms 37.286 26.095 1.429 37.283 26.06 1.431
Float64Vector.TANH 1024 thrpt 10 0.004 ops/ms 64.71 79.799 0.811 64.741 79.924 0.81
FloatMaxVector.ACOS 1024 thrpt 10 0.103 ops/ms 378.138 136.187 2.777 245.725 102.05 2.408
FloatMaxVector.ASIN 1024 thrpt 10 1.013 ops/ms 452.441 135.287 3.344 296.708 103.589 2.864
FloatMaxVector.ATAN 1024 thrpt 10 0.028 ops/ms 288.802 62.021 4.657 196.817 49.824 3.95
FloatMaxVector.ATAN2 1024 thrpt 10 0.037 ops/ms 216.386 40.889 5.292 135.756 32.75 4.145
FloatMaxVector.CBRT 1024 thrpt 10 0.269 ops/ms 187.141 49.382 3.79 114.819 39.203 2.929
FloatMaxVector.COS 1024 thrpt 10 0.014 ops/ms 163.726 78.882 2.076 93.184 63.087 1.477
FloatMaxVector.COSH 1024 thrpt 10 0.006 ops/ms 212.544 97.49 2.18 154.547 77.685 1.989
FloatMaxVector.EXP 1024 thrpt 10 0.048 ops/ms 955.792 117.15 8.159 488.526 83.227 5.87
FloatMaxVector.EXPM1 1024 thrpt 10 0.01 ops/ms 213.435 79.837 2.673 157.618 62.006 2.542
FloatMaxVector.HYPOT 1024 thrpt 10 0.041 ops/ms 308.446 74.165 4.159 191.259 58.628 3.262
FloatMaxVector.LOG 1024 thrpt 10 0.105 ops/ms 405.824 65.604 6.186 257.679 51.992 4.956
FloatMaxVector.LOG10 1024 thrpt 10 0.186 ops/ms 371.417 53.204 6.981 240.117 43.427 5.529
FloatMaxVector.LOG1P 1024 thrpt 10 0.713 ops/ms 395.943 52.002 7.614 246.515 42.196 5.842
FloatMaxVector.POW 1024 thrpt 10 0.079 ops/ms 115.35 27.143 4.25 73.411 25.226 2.91
FloatMaxVector.SIN 1024 thrpt 10 0.04 ops/ms 154.421 79.424 1.944 95.548 62.973 1.517
FloatMaxVector.SINH 1024 thrpt 10 0.04 ops/ms 202.51 97.974 2.067 153.3 77.106 1.988
FloatMaxVector.TAN 1024 thrpt 10 0.013 ops/ms 127.56 36.981 3.449 74.483 32.733 2.275
FloatMaxVector.TANH 1024 thrpt 10 0.792 ops/ms 247.428 247.743 0.999 129.375 144.932 0.893
FloatScalar.ACOS 1024 thrpt 10 0.09 ops/ms 337.034 337.102 1 336.994 337.001 1
FloatScalar.ASIN 1024 thrpt 10 0.096 ops/ms 351.308 351.34 1 351.273 351.293 1
FloatScalar.ATAN 1024 thrpt 10 0.008 ops/ms 91.71 91.657 1.001 91.627 91.403 1.002
FloatScalar.ATAN2 1024 thrpt 10 0.004 ops/ms 58.171 58.206 0.999 58.21 58.184 1
FloatScalar.CBRT 1024 thrpt 10 0.112 ops/ms 67.946 67.961 1 67.97 67.973 1
FloatScalar.COS 1024 thrpt 10 0.144 ops/ms 109.93 109.944 1 109.961 110.002 1
FloatScalar.COSH 1024 thrpt 10 0.008 ops/ms 136.223 136.357 0.999 136.427 136.5 0.999
FloatScalar.EXP 1024 thrpt 10 0.141 ops/ms 176.773 176.585 1.001 176.884 176.818 1
FloatScalar.EXPM1 1024 thrpt 10 0.015 ops/ms 127.417 127.504 0.999 127.536 126.957 1.005
FloatScalar.HYPOT 1024 thrpt 10 0.006 ops/ms 162.621 162.834 0.999 162.766 162.404 1.002
FloatScalar.LOG 1024 thrpt 10 0.029 ops/ms 92.565 92.4 1.002 92.567 92.565 1
FloatScalar.LOG10 1024 thrpt 10 0.005 ops/ms 70.792 70.774 1 70.789 70.799 1
FloatScalar.LOG1P 1024 thrpt 10 0.051 ops/ms 73.908 74.572 0.991 73.898 74.61 0.99
FloatScalar.POW 1024 thrpt 10 0.003 ops/ms 30.554 30.566 1 30.561 30.556 1
FloatScalar.SIN 1024 thrpt 10 0.248 ops/ms 109.954 109.57 1.004 109.873 109.842 1
FloatScalar.SINH 1024 thrpt 10 0.005 ops/ms 139.617 139.616 1 139.432 139.242 1.001
FloatScalar.TAN 1024 thrpt 10 0.007 ops/ms 44.327 44.16 1.004 44.478 44.401 1.002
FloatScalar.TANH 1024 thrpt 10 0.362 ops/ms 545.506 545.688 1 545.744 545.604 1

Double

data

Benchmark (size) Mode Cnt Error Units Score +intrinsic (UseSVE=1) Score -intrinsic Improvement(UseSVE=1) Score +intrinsic (UseSVE=0) Score -intrinsic (UseSVE=0) Improvement (UseSVE=0)
Double128Vector.ACOS 1024 thrpt 10 0.005 ops/ms 117.913 67.641 1.743 117.977 67.793 1.74
Double128Vector.ASIN 1024 thrpt 10 0.006 ops/ms 145.789 68.392 2.132 145.518 68.181 2.134
Double128Vector.ATAN 1024 thrpt 10 0.004 ops/ms 87.644 42.752 2.05 87.544 43.136 2.029
Double128Vector.ATAN2 1024 thrpt 10 0.003 ops/ms 60.414 26.235 2.303 60.182 26.313 2.287
Double128Vector.CBRT 1024 thrpt 10 0.001 ops/ms 52.679 30.617 1.721 52.657 30.69 1.716
Double128Vector.COS 1024 thrpt 10 0.004 ops/ms 71.501 47.165 1.516 71.612 47.114 1.52
Double128Vector.COSH 1024 thrpt 10 0.007 ops/ms 82.195 53.846 1.526 82.372 54.144 1.521
Double128Vector.EXP 1024 thrpt 10 0.012 ops/ms 216.471 58.192 3.72 217.261 58.271 3.728
Double128Vector.EXPM1 1024 thrpt 10 0.007 ops/ms 95.372 48.037 1.985 95.799 47.954 1.998
Double128Vector.HYPOT 1024 thrpt 10 0.002 ops/ms 88.137 37.331 2.361 87.856 37.307 2.355
Double128Vector.LOG 1024 thrpt 10 0.038 ops/ms 98.972 41.669 2.375 99.046 41.723 2.374
Double128Vector.LOG10 1024 thrpt 10 0.004 ops/ms 83.921 36.163 2.321 83.844 36.099 2.323
Double128Vector.LOG1P 1024 thrpt 10 0.006 ops/ms 86.526 36.291 2.384 86.592 36.148 2.395
Double128Vector.POW 1024 thrpt 10 0.001 ops/ms 34.439 21.817 1.579 34.373 21.618 1.59
Double128Vector.SIN 1024 thrpt 10 0.007 ops/ms 82.248 47.064 1.748 82.63 47.524 1.739
Double128Vector.SINH 1024 thrpt 10 0.005 ops/ms 80.27 53.565 1.499 80.404 53.438 1.505
Double128Vector.TAN 1024 thrpt 10 0.001 ops/ms 56.221 27.615 2.036 56.516 27.792 2.034
Double128Vector.TANH 1024 thrpt 10 0.011 ops/ms 64.979 83.143 0.782 65.652 82.771 0.793
Double256Vector.ACOS 1024 thrpt 10 0.455 ops/ms 179.103 112.49 1.592 87.833 88.651 0.991
Double256Vector.ASIN 1024 thrpt 10 0.691 ops/ms 212.368 112.884 1.881 88.369 88.365 1
Double256Vector.ATAN 1024 thrpt 10 0.008 ops/ms 120.882 55.861 2.164 49.106 48.979 1.003
Double256Vector.ATAN2 1024 thrpt 10 0.006 ops/ms 98.254 33.362 2.945 30.514 30.556 0.999
Double256Vector.CBRT 1024 thrpt 10 0.016 ops/ms 89.053 43.473 2.048 38.255 37.885 1.01
Double256Vector.COS 1024 thrpt 10 0.03 ops/ms 119.208 65.874 1.81 57.119 57.033 1.002
Double256Vector.COSH 1024 thrpt 10 0.01 ops/ms 124.26 76.188 1.631 63.477 63.002 1.008
Double256Vector.EXP 1024 thrpt 10 0.048 ops/ms 390.922 88.453 4.42 72.249 72.248 1
Double256Vector.EXPM1 1024 thrpt 10 0.017 ops/ms 121.844 66.475 1.833 57.431 57.36 1.001
Double256Vector.HYPOT 1024 thrpt 10 0.034 ops/ms 138.774 60.148 2.307 51.837 51.881 0.999
Double256Vector.LOG 1024 thrpt 10 0.073 ops/ms 165.474 55.445 2.984 48.7 48.571 1.003
Double256Vector.LOG10 1024 thrpt 10 0.015 ops/ms 144.862 44.937 3.224 40.579 40.624 0.999
Double256Vector.LOG1P 1024 thrpt 10 0.21 ops/ms 151.807 46.401 3.272 40.943 41.158 0.995
Double256Vector.POW 1024 thrpt 10 0.003 ops/ms 53.228 25.144 2.117 21.862 21.852 1
Double256Vector.SIN 1024 thrpt 10 0.007 ops/ms 130.875 65.753 1.99 57.42 57.172 1.004
Double256Vector.SINH 1024 thrpt 10 0.004 ops/ms 120.093 76.13 1.577 63.283 62.823 1.007
Double256Vector.TAN 1024 thrpt 10 0.073 ops/ms 79.318 33.242 2.386 30.463 30.322 1.005
Double256Vector.TANH 1024 thrpt 10 1.633 ops/ms 152.914 154.668 0.989 107.585 7.441 14.458
Double512Vector.ACOS 1024 thrpt 10 0.1 ops/ms 122.582 121.073 1.012 123.136 22.485 5.476
Double512Vector.ASIN 1024 thrpt 10 0.099 ops/ms 123.678 122.482 1.01 121.616 22.78 5.339
Double512Vector.ATAN 1024 thrpt 10 0.14 ops/ms 61.939 61.928 1 61.821 62.013 0.997
Double512Vector.ATAN2 1024 thrpt 10 0.014 ops/ms 38.638 38.541 1.003 38.668 38.697 0.999
Double512Vector.CBRT 1024 thrpt 10 0.024 ops/ms 49.685 49.667 1 49.674 49.634 1.001
Double512Vector.COS 1024 thrpt 10 0.046 ops/ms 74.125 73.99 1.002 74.462 72.102 1.033
Double512Vector.COSH 1024 thrpt 10 0.15 ops/ms 86.945 87.2 0.997 87.111 87.187 0.999
Double512Vector.EXP 1024 thrpt 10 0.507 ops/ms 100.955 101.43 0.995 101.213 1.336 75.758
Double512Vector.EXPM1 1024 thrpt 10 0.017 ops/ms 75.648 75.012 1.008 75.632 75.293 1.005
Double512Vector.HYPOT 1024 thrpt 10 0.3 ops/ms 72.42 72.487 0.999 72.457 72.277 1.002
Double512Vector.LOG 1024 thrpt 10 0.021 ops/ms 64.729 64.613 1.002 64.584 64.43 1.002
Double512Vector.LOG10 1024 thrpt 10 0.022 ops/ms 52.042 51.953 1.002 51.958 51.879 1.002
Double512Vector.LOG1P 1024 thrpt 10 0.103 ops/ms 52.239 52.169 1.001 52.161 52.176 1
Double512Vector.POW 1024 thrpt 10 0.008 ops/ms 25.488 25.473 1.001 25.462 25.461 1
Double512Vector.SIN 1024 thrpt 10 0.121 ops/ms 74.514 74.724 0.997 74.655 74.56 1.001
Double512Vector.SINH 1024 thrpt 10 0.216 ops/ms 86.568 86.488 1.001 86.673 86.855 0.998
Double512Vector.TAN 1024 thrpt 10 0.05 ops/ms 36.129 36.199 0.998 36.355 36.113 1.007
Double512Vector.TANH 1024 thrpt 10 0.125 ops/ms 172.425 171.657 1.004 171.701 71.727 2.394
Double64Vector.ACOS 1024 thrpt 10 0.125 ops/ms 29.916 30.242 0.989 30.232 30.135 1.003
Double64Vector.ASIN 1024 thrpt 10 0.008 ops/ms 30.677 30.58 1.003 30.396 30.524 0.996
Double64Vector.ATAN 1024 thrpt 10 0.038 ops/ms 19.561 19.526 1.002 19.446 19.456 0.999
Double64Vector.ATAN2 1024 thrpt 10 0.008 ops/ms 15.376 15.669 0.981 15.412 15.369 1.003
Double64Vector.CBRT 1024 thrpt 10 0.004 ops/ms 13.943 13.943 1 13.873 13.89 0.999
Double64Vector.COS 1024 thrpt 10 0.012 ops/ms 20.677 20.698 0.999 20.632 20.652 0.999
Double64Vector.COSH 1024 thrpt 10 0.036 ops/ms 22.949 23.116 0.993 23.163 23.241 0.997
Double64Vector.EXP 1024 thrpt 10 0.104 ops/ms 23.424 23.521 0.996 23.605 23.622 0.999
Double64Vector.EXPM1 1024 thrpt 10 0.157 ops/ms 22.301 22.353 0.998 21.973 22.166 0.991
Double64Vector.HYPOT 1024 thrpt 10 0.084 ops/ms 21.01 20.835 1.008 20.911 20.819 1.004
Double64Vector.LOG 1024 thrpt 10 0.041 ops/ms 18.265 18.291 0.999 18.192 18.21 0.999
Double64Vector.LOG10 1024 thrpt 10 0.003 ops/ms 16.502 16.441 1.004 16.393 16.433 0.998
Double64Vector.LOG1P 1024 thrpt 10 0.009 ops/ms 16.815 16.862 0.997 16.792 16.833 0.998
Double64Vector.POW 1024 thrpt 10 0.012 ops/ms 11.814 11.82 0.999 11.865 11.877 0.999
Double64Vector.SIN 1024 thrpt 10 0.005 ops/ms 20.557 20.605 0.998 20.57 20.26 1.015
Double64Vector.SINH 1024 thrpt 10 0.074 ops/ms 23.133 23.23 0.996 23.048 23.069 0.999
Double64Vector.TAN 1024 thrpt 10 0.009 ops/ms 14.504 14.553 0.997 14.456 14.518 0.996
Double64Vector.TANH 1024 thrpt 10 0.12 ops/ms 31.304 31.226 1.002 31.4 31.267 1.004
DoubleMaxVector.ACOS 1024 thrpt 10 0.146 ops/ms 179.388 112.342 1.597 118.005 67.768 1.741
DoubleMaxVector.ASIN 1024 thrpt 10 0.169 ops/ms 212.342 114.107 1.861 145.676 68.143 2.138
DoubleMaxVector.ATAN 1024 thrpt 10 0.011 ops/ms 120.925 55.823 2.166 86.676 43.156 2.008
DoubleMaxVector.ATAN2 1024 thrpt 10 0.006 ops/ms 98.345 33.604 2.927 60.45 26.383 2.291
DoubleMaxVector.CBRT 1024 thrpt 10 0.006 ops/ms 88.947 43.447 2.047 52.648 30.665 1.717
DoubleMaxVector.COS 1024 thrpt 10 0.023 ops/ms 119.164 65.718 1.813 71.619 47.145 1.519
DoubleMaxVector.COSH 1024 thrpt 10 0.005 ops/ms 124.342 75.967 1.637 82.447 54.084 1.524
DoubleMaxVector.EXP 1024 thrpt 10 0.042 ops/ms 390.767 87.918 4.445 216.207 58.342 3.706
DoubleMaxVector.EXPM1 1024 thrpt 10 0.018 ops/ms 121.79 66.387 1.835 95.935 48.204 1.99
DoubleMaxVector.HYPOT 1024 thrpt 10 0.011 ops/ms 138.549 61.183 2.265 87.859 37.39 2.35
DoubleMaxVector.LOG 1024 thrpt 10 0.034 ops/ms 164.687 55.44 2.971 98.446 41.873 2.351
DoubleMaxVector.LOG10 1024 thrpt 10 0.026 ops/ms 144.388 44.94 3.213 84.062 36.252 2.319
DoubleMaxVector.LOG1P 1024 thrpt 10 0.218 ops/ms 151.047 46.394 3.256 86.671 36.248 2.391
DoubleMaxVector.POW 1024 thrpt 10 0.004 ops/ms 53.241 25.251 2.108 34.371 21.58 1.593
DoubleMaxVector.SIN 1024 thrpt 10 0.003 ops/ms 130.708 65.451 1.997 83.012 47.547 1.746
DoubleMaxVector.SINH 1024 thrpt 10 0.007 ops/ms 120.654 75.693 1.594 80.603 53.586 1.504
DoubleMaxVector.TAN 1024 thrpt 10 0.062 ops/ms 80.045 33.268 2.406 56.48 27.723 2.037
DoubleMaxVector.TANH 1024 thrpt 10 0.99 ops/ms 154.334 153.197 1.007 65.401 82.937 0.789
DoubleScalar.ACOS 1024 thrpt 10 0.06 ops/ms 342.452 342.471 1 342.471 42.461 8.066
DoubleScalar.ASIN 1024 thrpt 10 0.09 ops/ms 353.739 354.47 0.998 352.211 54.513 6.461
DoubleScalar.ATAN 1024 thrpt 10 0.043 ops/ms 100.797 101.069 0.997 101.089 1.086 93.084
DoubleScalar.ATAN2 1024 thrpt 10 0.025 ops/ms 62.29 62.283 1 62.218 62.227 1
DoubleScalar.CBRT 1024 thrpt 10 0.014 ops/ms 73.922 73.929 1 73.906 73.916 1
DoubleScalar.COS 1024 thrpt 10 0.204 ops/ms 117.948 117.806 1.001 117.856 17.763 6.635
DoubleScalar.COSH 1024 thrpt 10 0.016 ops/ms 141.113 141.083 1 141.749 40.659 3.486
DoubleScalar.EXP 1024 thrpt 10 0.008 ops/ms 189.453 188.923 1.003 189.555 89.348 2.122
DoubleScalar.EXPM1 1024 thrpt 10 0.051 ops/ms 133.617 133.549 1.001 133.224 33.61 3.964
DoubleScalar.HYPOT 1024 thrpt 10 3.613 ops/ms 180.215 175.912 1.024 176.083 81.916 2.15
DoubleScalar.LOG 1024 thrpt 10 0.013 ops/ms 101.791 101.801 1 101.779 1.786 56.987
DoubleScalar.LOG10 1024 thrpt 10 0.099 ops/ms 76.849 76.847 1 76.807 76.757 1.001
DoubleScalar.LOG1P 1024 thrpt 10 0.081 ops/ms 79.261 79.298 1 79.268 79.281 1
DoubleScalar.POW 1024 thrpt 10 0.002 ops/ms 31.915 31.925 1 31.919 31.92 1
DoubleScalar.SIN 1024 thrpt 10 0.167 ops/ms 118.087 117.722 1.003 118.292 18.243 6.484
DoubleScalar.SINH 1024 thrpt 10 0.012 ops/ms 143.901 143.803 1.001 144.228 43.922 3.284
DoubleScalar.TAN 1024 thrpt 10 0.047 ops/ms 46.513 46.584 0.998 46.503 46.778 0.994
DoubleScalar.TANH 1024 thrpt 10 0.204 ops/ms 552.603 561.965 0.983 561.941 61.802 9.093

Backup of previous test summary

NOTE:

  • Src means implementation in this pr, i.e. without depenency on external sleef.
  • Disabled means disable intrinsics by -XX:-UseVectorStubs
  • system_sleef means implementation in previous pr 18294, i.e. build and run jdk with depenency on external sleef.

Basically, the perf data below shows that

  • this implementation has better performance than previous version in pr 18294,
  • and both sleef versions has much better performance compared with non-sleef version.

Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF (Enhancement - P4)

Contributors

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605
$ git checkout pull/18605

Update a local copy of the PR:
$ git checkout pull/18605
$ git pull https://git.openjdk.org/jdk.git pull/18605/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18605

View PR using the GUI difftool:
$ git pr show -t 18605

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18605.diff

Webrev

Link to Webrev Comment

Hamlin-Li avatar Apr 03 '24 14:04 Hamlin-Li