jdk
jdk copied to clipboard
8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF
Hi, Can you help to review the patch? This pr is based on previous work and discussion in pr 16234, pr 18294.
Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check src/jdk.incubator.vector/linux/native/libvectormath/README
), rather than depends on external sleef things (header or lib) at build or run time.
Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
Besides of the code changes, one important task is to handle the legal process.
Thanks!
Performance
Options
- +intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:+UseVectorStubs'
- -intrinsic: 'FORK=1;ITER=10;WARMUP_ITER=10;JAVA_OPTIONS=-XX:+UnlockExperimentalVMOptions -XX:+EnableVectorSupport -XX:-UseVectorStubs'
Float
data
Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic | Improvement (UseSVE=0) |
---|---|---|---|---|---|---|---|---|---|---|---|
Float128Vector.ACOS | 1024 | thrpt | 10 | 0.015 | ops/ms | 245.439 | 101.483 | 2.419 | 245.733 | 102.033 | 2.408 |
Float128Vector.ASIN | 1024 | thrpt | 10 | 0.013 | ops/ms | 296.702 | 103.559 | 2.865 | 296.741 | 103.18 | 2.876 |
Float128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 196.862 | 49.627 | 3.967 | 195.891 | 49.771 | 3.936 |
Float128Vector.ATAN2 | 1024 | thrpt | 10 | 0.021 | ops/ms | 135.088 | 32.449 | 4.163 | 135.721 | 32.579 | 4.166 |
Float128Vector.CBRT | 1024 | thrpt | 10 | 0.004 | ops/ms | 114.547 | 39.517 | 2.899 | 114.756 | 39.273 | 2.922 |
Float128Vector.COS | 1024 | thrpt | 10 | 0.006 | ops/ms | 93.226 | 62.883 | 1.483 | 93.195 | 63.116 | 1.477 |
Float128Vector.COSH | 1024 | thrpt | 10 | 0.005 | ops/ms | 154.498 | 76.58 | 2.017 | 154.147 | 77.026 | 2.001 |
Float128Vector.EXP | 1024 | thrpt | 10 | 0.248 | ops/ms | 483.569 | 83.614 | 5.783 | 502.786 | 83.424 | 6.027 |
Float128Vector.EXPM1 | 1024 | thrpt | 10 | 0.01 | ops/ms | 156.338 | 62.091 | 2.518 | 157.589 | 62.008 | 2.541 |
Float128Vector.HYPOT | 1024 | thrpt | 10 | 0.007 | ops/ms | 191.217 | 56.834 | 3.364 | 191.247 | 58.624 | 3.262 |
Float128Vector.LOG | 1024 | thrpt | 10 | 0.019 | ops/ms | 258.223 | 52.005 | 4.965 | 259.642 | 52.018 | 4.991 |
Float128Vector.LOG10 | 1024 | thrpt | 10 | 0.004 | ops/ms | 238.916 | 43.311 | 5.516 | 240.135 | 43.352 | 5.539 |
Float128Vector.LOG1P | 1024 | thrpt | 10 | 0.112 | ops/ms | 246.507 | 42.227 | 5.838 | 246.546 | 42.24 | 5.837 |
Float128Vector.POW | 1024 | thrpt | 10 | 0.033 | ops/ms | 73.78 | 25.17 | 2.931 | 73.693 | 25.113 | 2.934 |
Float128Vector.SIN | 1024 | thrpt | 10 | 0.004 | ops/ms | 95.509 | 62.807 | 1.521 | 95.792 | 62.883 | 1.523 |
Float128Vector.SINH | 1024 | thrpt | 10 | 0.011 | ops/ms | 153.177 | 77.586 | 1.974 | 152.97 | 77.248 | 1.98 |
Float128Vector.TAN | 1024 | thrpt | 10 | 0.002 | ops/ms | 74.394 | 32.662 | 2.278 | 74.491 | 32.639 | 2.282 |
Float128Vector.TANH | 1024 | thrpt | 10 | 0.005 | ops/ms | 129.308 | 144.581 | 0.894 | 129.319 | 144.916 | 0.892 |
Float256Vector.ACOS | 1024 | thrpt | 10 | 0.311 | ops/ms | 378.109 | 135.118 | 2.798 | 122.381 | 123.502 | 0.991 |
Float256Vector.ASIN | 1024 | thrpt | 10 | 1.039 | ops/ms | 452.692 | 135.067 | 3.352 | 126.037 | 123.53 | 1.02 |
Float256Vector.ATAN | 1024 | thrpt | 10 | 0.017 | ops/ms | 288.785 | 62.032 | 4.655 | 59.783 | 59.821 | 0.999 |
Float256Vector.ATAN2 | 1024 | thrpt | 10 | 0.065 | ops/ms | 217.573 | 40.843 | 5.327 | 38.337 | 38.352 | 1 |
Float256Vector.CBRT | 1024 | thrpt | 10 | 0.042 | ops/ms | 185.721 | 49.353 | 3.763 | 46.273 | 46.279 | 1 |
Float256Vector.COS | 1024 | thrpt | 10 | 0.036 | ops/ms | 163.584 | 78.947 | 2.072 | 70.544 | 70.74 | 0.997 |
Float256Vector.COSH | 1024 | thrpt | 10 | 0.01 | ops/ms | 211.746 | 96.885 | 2.186 | 84.078 | 84.366 | 0.997 |
Float256Vector.EXP | 1024 | thrpt | 10 | 0.121 | ops/ms | 954.69 | 117.145 | 8.15 | 97.97 | 97.713 | 1.003 |
Float256Vector.EXPM1 | 1024 | thrpt | 10 | 0.055 | ops/ms | 213.462 | 79.832 | 2.674 | 74.292 | 74.36 | 0.999 |
Float256Vector.HYPOT | 1024 | thrpt | 10 | 0.052 | ops/ms | 306.511 | 74.208 | 4.13 | 68.856 | 69.077 | 0.997 |
Float256Vector.LOG | 1024 | thrpt | 10 | 0.216 | ops/ms | 406.914 | 65.408 | 6.221 | 59.808 | 59.767 | 1.001 |
Float256Vector.LOG10 | 1024 | thrpt | 10 | 0.37 | ops/ms | 371.385 | 53.156 | 6.987 | 49.334 | 49.171 | 1.003 |
Float256Vector.LOG1P | 1024 | thrpt | 10 | 1.851 | ops/ms | 397.247 | 52.042 | 7.633 | 50.181 | 50.199 | 1 |
Float256Vector.POW | 1024 | thrpt | 10 | 0.048 | ops/ms | 115.155 | 27.174 | 4.238 | 24.659 | 24.703 | 0.998 |
Float256Vector.SIN | 1024 | thrpt | 10 | 0.107 | ops/ms | 154.975 | 79.103 | 1.959 | 70.9 | 70.615 | 1.004 |
Float256Vector.SINH | 1024 | thrpt | 10 | 0.351 | ops/ms | 202.683 | 97.643 | 2.076 | 84.587 | 84.371 | 1.003 |
Float256Vector.TAN | 1024 | thrpt | 10 | 0.005 | ops/ms | 127.597 | 37.136 | 3.436 | 34.774 | 34.757 | 1 |
Float256Vector.TANH | 1024 | thrpt | 10 | 1.233 | ops/ms | 249.084 | 247.272 | 1.007 | 169.903 | 169.805 | 1.001 |
Float512Vector.ACOS | 1024 | thrpt | 10 | 0.069 | ops/ms | 148.467 | 152.264 | 0.975 | 150.131 | 154.717 | 0.97 |
Float512Vector.ASIN | 1024 | thrpt | 10 | 0.287 | ops/ms | 147.144 | 158.074 | 0.931 | 147.251 | 148.71 | 0.99 |
Float512Vector.ATAN | 1024 | thrpt | 10 | 0.101 | ops/ms | 68.498 | 67.987 | 1.008 | 67.968 | 68.131 | 0.998 |
Float512Vector.ATAN2 | 1024 | thrpt | 10 | 0.016 | ops/ms | 44.189 | 44.052 | 1.003 | 43.898 | 43.781 | 1.003 |
Float512Vector.CBRT | 1024 | thrpt | 10 | 0.012 | ops/ms | 53.514 | 53.672 | 0.997 | 53.623 | 53.635 | 1 |
Float512Vector.COS | 1024 | thrpt | 10 | 0.222 | ops/ms | 80.566 | 80.713 | 0.998 | 80.672 | 80.796 | 0.998 |
Float512Vector.COSH | 1024 | thrpt | 10 | 0.104 | ops/ms | 102.175 | 102.038 | 1.001 | 102.303 | 102.009 | 1.003 |
Float512Vector.EXP | 1024 | thrpt | 10 | 0.255 | ops/ms | 118.824 | 118.942 | 0.999 | 118.551 | 118.976 | 0.996 |
Float512Vector.EXPM1 | 1024 | thrpt | 10 | 0.021 | ops/ms | 87.363 | 87.153 | 1.002 | 87.842 | 87.387 | 1.005 |
Float512Vector.HYPOT | 1024 | thrpt | 10 | 0.048 | ops/ms | 86.838 | 86.439 | 1.005 | 86.903 | 86.709 | 1.002 |
Float512Vector.LOG | 1024 | thrpt | 10 | 0.017 | ops/ms | 70.794 | 70.746 | 1.001 | 70.469 | 70.62 | 0.998 |
Float512Vector.LOG10 | 1024 | thrpt | 10 | 0.051 | ops/ms | 55.821 | 55.85 | 0.999 | 55.883 | 55.773 | 1.002 |
Float512Vector.LOG1P | 1024 | thrpt | 10 | 0.085 | ops/ms | 57.113 | 57.582 | 0.992 | 56.942 | 57.245 | 0.995 |
Float512Vector.POW | 1024 | thrpt | 10 | 0.006 | ops/ms | 26.66 | 26.656 | 1 | 26.651 | 26.641 | 1 |
Float512Vector.SIN | 1024 | thrpt | 10 | 0.067 | ops/ms | 80.873 | 80.806 | 1.001 | 80.638 | 80.456 | 1.002 |
Float512Vector.SINH | 1024 | thrpt | 10 | 0.16 | ops/ms | 103.818 | 102.766 | 1.01 | 102.669 | 103.83 | 0.989 |
Float512Vector.TAN | 1024 | thrpt | 10 | 0.148 | ops/ms | 38.107 | 37.971 | 1.004 | 37.938 | 37.862 | 1.002 |
Float512Vector.TANH | 1024 | thrpt | 10 | 1.206 | ops/ms | 237.573 | 235.876 | 1.007 | 236.684 | 236.724 | 1 |
Float64Vector.ACOS | 1024 | thrpt | 10 | 0.006 | ops/ms | 123.038 | 64.939 | 1.895 | 123.07 | 65.556 | 1.877 |
Float64Vector.ASIN | 1024 | thrpt | 10 | 0.006 | ops/ms | 148.56 | 65.115 | 2.282 | 148.576 | 66.468 | 2.235 |
Float64Vector.ATAN | 1024 | thrpt | 10 | 0.003 | ops/ms | 98.512 | 40.569 | 2.428 | 98.458 | 40.932 | 2.405 |
Float64Vector.ATAN2 | 1024 | thrpt | 10 | 0.004 | ops/ms | 67.706 | 24.824 | 2.727 | 68.214 | 25.157 | 2.712 |
Float64Vector.CBRT | 1024 | thrpt | 10 | 0.001 | ops/ms | 57.299 | 29.725 | 1.928 | 57.343 | 29.279 | 1.959 |
Float64Vector.COS | 1024 | thrpt | 10 | 0.008 | ops/ms | 46.689 | 44.153 | 1.057 | 46.67 | 43.683 | 1.068 |
Float64Vector.COSH | 1024 | thrpt | 10 | 0.005 | ops/ms | 77.552 | 51.012 | 1.52 | 77.66 | 51.285 | 1.514 |
Float64Vector.EXP | 1024 | thrpt | 10 | 0.257 | ops/ms | 242.736 | 54.277 | 4.472 | 248.345 | 54.298 | 4.574 |
Float64Vector.EXPM1 | 1024 | thrpt | 10 | 0.003 | ops/ms | 78.741 | 45.22 | 1.741 | 79.082 | 45.396 | 1.742 |
Float64Vector.HYPOT | 1024 | thrpt | 10 | 0.002 | ops/ms | 95.716 | 36.135 | 2.649 | 95.702 | 36.424 | 2.627 |
Float64Vector.LOG | 1024 | thrpt | 10 | 0.006 | ops/ms | 130.395 | 38.954 | 3.347 | 130.321 | 38.99 | 3.342 |
Float64Vector.LOG10 | 1024 | thrpt | 10 | 0.003 | ops/ms | 119.783 | 33.912 | 3.532 | 120.254 | 33.951 | 3.542 |
Float64Vector.LOG1P | 1024 | thrpt | 10 | 0.006 | ops/ms | 123.966 | 34.381 | 3.606 | 123.984 | 34.291 | 3.616 |
Float64Vector.POW | 1024 | thrpt | 10 | 0.003 | ops/ms | 36.872 | 21.747 | 1.695 | 36.774 | 21.639 | 1.699 |
Float64Vector.SIN | 1024 | thrpt | 10 | 0.002 | ops/ms | 48.008 | 44.076 | 1.089 | 48.001 | 43.989 | 1.091 |
Float64Vector.SINH | 1024 | thrpt | 10 | 0.004 | ops/ms | 76.711 | 50.893 | 1.507 | 76.936 | 51.236 | 1.502 |
Float64Vector.TAN | 1024 | thrpt | 10 | 0.006 | ops/ms | 37.286 | 26.095 | 1.429 | 37.283 | 26.06 | 1.431 |
Float64Vector.TANH | 1024 | thrpt | 10 | 0.004 | ops/ms | 64.71 | 79.799 | 0.811 | 64.741 | 79.924 | 0.81 |
FloatMaxVector.ACOS | 1024 | thrpt | 10 | 0.103 | ops/ms | 378.138 | 136.187 | 2.777 | 245.725 | 102.05 | 2.408 |
FloatMaxVector.ASIN | 1024 | thrpt | 10 | 1.013 | ops/ms | 452.441 | 135.287 | 3.344 | 296.708 | 103.589 | 2.864 |
FloatMaxVector.ATAN | 1024 | thrpt | 10 | 0.028 | ops/ms | 288.802 | 62.021 | 4.657 | 196.817 | 49.824 | 3.95 |
FloatMaxVector.ATAN2 | 1024 | thrpt | 10 | 0.037 | ops/ms | 216.386 | 40.889 | 5.292 | 135.756 | 32.75 | 4.145 |
FloatMaxVector.CBRT | 1024 | thrpt | 10 | 0.269 | ops/ms | 187.141 | 49.382 | 3.79 | 114.819 | 39.203 | 2.929 |
FloatMaxVector.COS | 1024 | thrpt | 10 | 0.014 | ops/ms | 163.726 | 78.882 | 2.076 | 93.184 | 63.087 | 1.477 |
FloatMaxVector.COSH | 1024 | thrpt | 10 | 0.006 | ops/ms | 212.544 | 97.49 | 2.18 | 154.547 | 77.685 | 1.989 |
FloatMaxVector.EXP | 1024 | thrpt | 10 | 0.048 | ops/ms | 955.792 | 117.15 | 8.159 | 488.526 | 83.227 | 5.87 |
FloatMaxVector.EXPM1 | 1024 | thrpt | 10 | 0.01 | ops/ms | 213.435 | 79.837 | 2.673 | 157.618 | 62.006 | 2.542 |
FloatMaxVector.HYPOT | 1024 | thrpt | 10 | 0.041 | ops/ms | 308.446 | 74.165 | 4.159 | 191.259 | 58.628 | 3.262 |
FloatMaxVector.LOG | 1024 | thrpt | 10 | 0.105 | ops/ms | 405.824 | 65.604 | 6.186 | 257.679 | 51.992 | 4.956 |
FloatMaxVector.LOG10 | 1024 | thrpt | 10 | 0.186 | ops/ms | 371.417 | 53.204 | 6.981 | 240.117 | 43.427 | 5.529 |
FloatMaxVector.LOG1P | 1024 | thrpt | 10 | 0.713 | ops/ms | 395.943 | 52.002 | 7.614 | 246.515 | 42.196 | 5.842 |
FloatMaxVector.POW | 1024 | thrpt | 10 | 0.079 | ops/ms | 115.35 | 27.143 | 4.25 | 73.411 | 25.226 | 2.91 |
FloatMaxVector.SIN | 1024 | thrpt | 10 | 0.04 | ops/ms | 154.421 | 79.424 | 1.944 | 95.548 | 62.973 | 1.517 |
FloatMaxVector.SINH | 1024 | thrpt | 10 | 0.04 | ops/ms | 202.51 | 97.974 | 2.067 | 153.3 | 77.106 | 1.988 |
FloatMaxVector.TAN | 1024 | thrpt | 10 | 0.013 | ops/ms | 127.56 | 36.981 | 3.449 | 74.483 | 32.733 | 2.275 |
FloatMaxVector.TANH | 1024 | thrpt | 10 | 0.792 | ops/ms | 247.428 | 247.743 | 0.999 | 129.375 | 144.932 | 0.893 |
FloatScalar.ACOS | 1024 | thrpt | 10 | 0.09 | ops/ms | 337.034 | 337.102 | 1 | 336.994 | 337.001 | 1 |
FloatScalar.ASIN | 1024 | thrpt | 10 | 0.096 | ops/ms | 351.308 | 351.34 | 1 | 351.273 | 351.293 | 1 |
FloatScalar.ATAN | 1024 | thrpt | 10 | 0.008 | ops/ms | 91.71 | 91.657 | 1.001 | 91.627 | 91.403 | 1.002 |
FloatScalar.ATAN2 | 1024 | thrpt | 10 | 0.004 | ops/ms | 58.171 | 58.206 | 0.999 | 58.21 | 58.184 | 1 |
FloatScalar.CBRT | 1024 | thrpt | 10 | 0.112 | ops/ms | 67.946 | 67.961 | 1 | 67.97 | 67.973 | 1 |
FloatScalar.COS | 1024 | thrpt | 10 | 0.144 | ops/ms | 109.93 | 109.944 | 1 | 109.961 | 110.002 | 1 |
FloatScalar.COSH | 1024 | thrpt | 10 | 0.008 | ops/ms | 136.223 | 136.357 | 0.999 | 136.427 | 136.5 | 0.999 |
FloatScalar.EXP | 1024 | thrpt | 10 | 0.141 | ops/ms | 176.773 | 176.585 | 1.001 | 176.884 | 176.818 | 1 |
FloatScalar.EXPM1 | 1024 | thrpt | 10 | 0.015 | ops/ms | 127.417 | 127.504 | 0.999 | 127.536 | 126.957 | 1.005 |
FloatScalar.HYPOT | 1024 | thrpt | 10 | 0.006 | ops/ms | 162.621 | 162.834 | 0.999 | 162.766 | 162.404 | 1.002 |
FloatScalar.LOG | 1024 | thrpt | 10 | 0.029 | ops/ms | 92.565 | 92.4 | 1.002 | 92.567 | 92.565 | 1 |
FloatScalar.LOG10 | 1024 | thrpt | 10 | 0.005 | ops/ms | 70.792 | 70.774 | 1 | 70.789 | 70.799 | 1 |
FloatScalar.LOG1P | 1024 | thrpt | 10 | 0.051 | ops/ms | 73.908 | 74.572 | 0.991 | 73.898 | 74.61 | 0.99 |
FloatScalar.POW | 1024 | thrpt | 10 | 0.003 | ops/ms | 30.554 | 30.566 | 1 | 30.561 | 30.556 | 1 |
FloatScalar.SIN | 1024 | thrpt | 10 | 0.248 | ops/ms | 109.954 | 109.57 | 1.004 | 109.873 | 109.842 | 1 |
FloatScalar.SINH | 1024 | thrpt | 10 | 0.005 | ops/ms | 139.617 | 139.616 | 1 | 139.432 | 139.242 | 1.001 |
FloatScalar.TAN | 1024 | thrpt | 10 | 0.007 | ops/ms | 44.327 | 44.16 | 1.004 | 44.478 | 44.401 | 1.002 |
FloatScalar.TANH | 1024 | thrpt | 10 | 0.362 | ops/ms | 545.506 | 545.688 | 1 | 545.744 | 545.604 | 1 |
Double
data
Benchmark | (size) | Mode | Cnt | Error | Units | Score +intrinsic (UseSVE=1) | Score -intrinsic | Improvement(UseSVE=1) | Score +intrinsic (UseSVE=0) | Score -intrinsic (UseSVE=0) | Improvement (UseSVE=0) |
---|---|---|---|---|---|---|---|---|---|---|---|
Double128Vector.ACOS | 1024 | thrpt | 10 | 0.005 | ops/ms | 117.913 | 67.641 | 1.743 | 117.977 | 67.793 | 1.74 |
Double128Vector.ASIN | 1024 | thrpt | 10 | 0.006 | ops/ms | 145.789 | 68.392 | 2.132 | 145.518 | 68.181 | 2.134 |
Double128Vector.ATAN | 1024 | thrpt | 10 | 0.004 | ops/ms | 87.644 | 42.752 | 2.05 | 87.544 | 43.136 | 2.029 |
Double128Vector.ATAN2 | 1024 | thrpt | 10 | 0.003 | ops/ms | 60.414 | 26.235 | 2.303 | 60.182 | 26.313 | 2.287 |
Double128Vector.CBRT | 1024 | thrpt | 10 | 0.001 | ops/ms | 52.679 | 30.617 | 1.721 | 52.657 | 30.69 | 1.716 |
Double128Vector.COS | 1024 | thrpt | 10 | 0.004 | ops/ms | 71.501 | 47.165 | 1.516 | 71.612 | 47.114 | 1.52 |
Double128Vector.COSH | 1024 | thrpt | 10 | 0.007 | ops/ms | 82.195 | 53.846 | 1.526 | 82.372 | 54.144 | 1.521 |
Double128Vector.EXP | 1024 | thrpt | 10 | 0.012 | ops/ms | 216.471 | 58.192 | 3.72 | 217.261 | 58.271 | 3.728 |
Double128Vector.EXPM1 | 1024 | thrpt | 10 | 0.007 | ops/ms | 95.372 | 48.037 | 1.985 | 95.799 | 47.954 | 1.998 |
Double128Vector.HYPOT | 1024 | thrpt | 10 | 0.002 | ops/ms | 88.137 | 37.331 | 2.361 | 87.856 | 37.307 | 2.355 |
Double128Vector.LOG | 1024 | thrpt | 10 | 0.038 | ops/ms | 98.972 | 41.669 | 2.375 | 99.046 | 41.723 | 2.374 |
Double128Vector.LOG10 | 1024 | thrpt | 10 | 0.004 | ops/ms | 83.921 | 36.163 | 2.321 | 83.844 | 36.099 | 2.323 |
Double128Vector.LOG1P | 1024 | thrpt | 10 | 0.006 | ops/ms | 86.526 | 36.291 | 2.384 | 86.592 | 36.148 | 2.395 |
Double128Vector.POW | 1024 | thrpt | 10 | 0.001 | ops/ms | 34.439 | 21.817 | 1.579 | 34.373 | 21.618 | 1.59 |
Double128Vector.SIN | 1024 | thrpt | 10 | 0.007 | ops/ms | 82.248 | 47.064 | 1.748 | 82.63 | 47.524 | 1.739 |
Double128Vector.SINH | 1024 | thrpt | 10 | 0.005 | ops/ms | 80.27 | 53.565 | 1.499 | 80.404 | 53.438 | 1.505 |
Double128Vector.TAN | 1024 | thrpt | 10 | 0.001 | ops/ms | 56.221 | 27.615 | 2.036 | 56.516 | 27.792 | 2.034 |
Double128Vector.TANH | 1024 | thrpt | 10 | 0.011 | ops/ms | 64.979 | 83.143 | 0.782 | 65.652 | 82.771 | 0.793 |
Double256Vector.ACOS | 1024 | thrpt | 10 | 0.455 | ops/ms | 179.103 | 112.49 | 1.592 | 87.833 | 88.651 | 0.991 |
Double256Vector.ASIN | 1024 | thrpt | 10 | 0.691 | ops/ms | 212.368 | 112.884 | 1.881 | 88.369 | 88.365 | 1 |
Double256Vector.ATAN | 1024 | thrpt | 10 | 0.008 | ops/ms | 120.882 | 55.861 | 2.164 | 49.106 | 48.979 | 1.003 |
Double256Vector.ATAN2 | 1024 | thrpt | 10 | 0.006 | ops/ms | 98.254 | 33.362 | 2.945 | 30.514 | 30.556 | 0.999 |
Double256Vector.CBRT | 1024 | thrpt | 10 | 0.016 | ops/ms | 89.053 | 43.473 | 2.048 | 38.255 | 37.885 | 1.01 |
Double256Vector.COS | 1024 | thrpt | 10 | 0.03 | ops/ms | 119.208 | 65.874 | 1.81 | 57.119 | 57.033 | 1.002 |
Double256Vector.COSH | 1024 | thrpt | 10 | 0.01 | ops/ms | 124.26 | 76.188 | 1.631 | 63.477 | 63.002 | 1.008 |
Double256Vector.EXP | 1024 | thrpt | 10 | 0.048 | ops/ms | 390.922 | 88.453 | 4.42 | 72.249 | 72.248 | 1 |
Double256Vector.EXPM1 | 1024 | thrpt | 10 | 0.017 | ops/ms | 121.844 | 66.475 | 1.833 | 57.431 | 57.36 | 1.001 |
Double256Vector.HYPOT | 1024 | thrpt | 10 | 0.034 | ops/ms | 138.774 | 60.148 | 2.307 | 51.837 | 51.881 | 0.999 |
Double256Vector.LOG | 1024 | thrpt | 10 | 0.073 | ops/ms | 165.474 | 55.445 | 2.984 | 48.7 | 48.571 | 1.003 |
Double256Vector.LOG10 | 1024 | thrpt | 10 | 0.015 | ops/ms | 144.862 | 44.937 | 3.224 | 40.579 | 40.624 | 0.999 |
Double256Vector.LOG1P | 1024 | thrpt | 10 | 0.21 | ops/ms | 151.807 | 46.401 | 3.272 | 40.943 | 41.158 | 0.995 |
Double256Vector.POW | 1024 | thrpt | 10 | 0.003 | ops/ms | 53.228 | 25.144 | 2.117 | 21.862 | 21.852 | 1 |
Double256Vector.SIN | 1024 | thrpt | 10 | 0.007 | ops/ms | 130.875 | 65.753 | 1.99 | 57.42 | 57.172 | 1.004 |
Double256Vector.SINH | 1024 | thrpt | 10 | 0.004 | ops/ms | 120.093 | 76.13 | 1.577 | 63.283 | 62.823 | 1.007 |
Double256Vector.TAN | 1024 | thrpt | 10 | 0.073 | ops/ms | 79.318 | 33.242 | 2.386 | 30.463 | 30.322 | 1.005 |
Double256Vector.TANH | 1024 | thrpt | 10 | 1.633 | ops/ms | 152.914 | 154.668 | 0.989 | 107.585 | 7.441 | 14.458 |
Double512Vector.ACOS | 1024 | thrpt | 10 | 0.1 | ops/ms | 122.582 | 121.073 | 1.012 | 123.136 | 22.485 | 5.476 |
Double512Vector.ASIN | 1024 | thrpt | 10 | 0.099 | ops/ms | 123.678 | 122.482 | 1.01 | 121.616 | 22.78 | 5.339 |
Double512Vector.ATAN | 1024 | thrpt | 10 | 0.14 | ops/ms | 61.939 | 61.928 | 1 | 61.821 | 62.013 | 0.997 |
Double512Vector.ATAN2 | 1024 | thrpt | 10 | 0.014 | ops/ms | 38.638 | 38.541 | 1.003 | 38.668 | 38.697 | 0.999 |
Double512Vector.CBRT | 1024 | thrpt | 10 | 0.024 | ops/ms | 49.685 | 49.667 | 1 | 49.674 | 49.634 | 1.001 |
Double512Vector.COS | 1024 | thrpt | 10 | 0.046 | ops/ms | 74.125 | 73.99 | 1.002 | 74.462 | 72.102 | 1.033 |
Double512Vector.COSH | 1024 | thrpt | 10 | 0.15 | ops/ms | 86.945 | 87.2 | 0.997 | 87.111 | 87.187 | 0.999 |
Double512Vector.EXP | 1024 | thrpt | 10 | 0.507 | ops/ms | 100.955 | 101.43 | 0.995 | 101.213 | 1.336 | 75.758 |
Double512Vector.EXPM1 | 1024 | thrpt | 10 | 0.017 | ops/ms | 75.648 | 75.012 | 1.008 | 75.632 | 75.293 | 1.005 |
Double512Vector.HYPOT | 1024 | thrpt | 10 | 0.3 | ops/ms | 72.42 | 72.487 | 0.999 | 72.457 | 72.277 | 1.002 |
Double512Vector.LOG | 1024 | thrpt | 10 | 0.021 | ops/ms | 64.729 | 64.613 | 1.002 | 64.584 | 64.43 | 1.002 |
Double512Vector.LOG10 | 1024 | thrpt | 10 | 0.022 | ops/ms | 52.042 | 51.953 | 1.002 | 51.958 | 51.879 | 1.002 |
Double512Vector.LOG1P | 1024 | thrpt | 10 | 0.103 | ops/ms | 52.239 | 52.169 | 1.001 | 52.161 | 52.176 | 1 |
Double512Vector.POW | 1024 | thrpt | 10 | 0.008 | ops/ms | 25.488 | 25.473 | 1.001 | 25.462 | 25.461 | 1 |
Double512Vector.SIN | 1024 | thrpt | 10 | 0.121 | ops/ms | 74.514 | 74.724 | 0.997 | 74.655 | 74.56 | 1.001 |
Double512Vector.SINH | 1024 | thrpt | 10 | 0.216 | ops/ms | 86.568 | 86.488 | 1.001 | 86.673 | 86.855 | 0.998 |
Double512Vector.TAN | 1024 | thrpt | 10 | 0.05 | ops/ms | 36.129 | 36.199 | 0.998 | 36.355 | 36.113 | 1.007 |
Double512Vector.TANH | 1024 | thrpt | 10 | 0.125 | ops/ms | 172.425 | 171.657 | 1.004 | 171.701 | 71.727 | 2.394 |
Double64Vector.ACOS | 1024 | thrpt | 10 | 0.125 | ops/ms | 29.916 | 30.242 | 0.989 | 30.232 | 30.135 | 1.003 |
Double64Vector.ASIN | 1024 | thrpt | 10 | 0.008 | ops/ms | 30.677 | 30.58 | 1.003 | 30.396 | 30.524 | 0.996 |
Double64Vector.ATAN | 1024 | thrpt | 10 | 0.038 | ops/ms | 19.561 | 19.526 | 1.002 | 19.446 | 19.456 | 0.999 |
Double64Vector.ATAN2 | 1024 | thrpt | 10 | 0.008 | ops/ms | 15.376 | 15.669 | 0.981 | 15.412 | 15.369 | 1.003 |
Double64Vector.CBRT | 1024 | thrpt | 10 | 0.004 | ops/ms | 13.943 | 13.943 | 1 | 13.873 | 13.89 | 0.999 |
Double64Vector.COS | 1024 | thrpt | 10 | 0.012 | ops/ms | 20.677 | 20.698 | 0.999 | 20.632 | 20.652 | 0.999 |
Double64Vector.COSH | 1024 | thrpt | 10 | 0.036 | ops/ms | 22.949 | 23.116 | 0.993 | 23.163 | 23.241 | 0.997 |
Double64Vector.EXP | 1024 | thrpt | 10 | 0.104 | ops/ms | 23.424 | 23.521 | 0.996 | 23.605 | 23.622 | 0.999 |
Double64Vector.EXPM1 | 1024 | thrpt | 10 | 0.157 | ops/ms | 22.301 | 22.353 | 0.998 | 21.973 | 22.166 | 0.991 |
Double64Vector.HYPOT | 1024 | thrpt | 10 | 0.084 | ops/ms | 21.01 | 20.835 | 1.008 | 20.911 | 20.819 | 1.004 |
Double64Vector.LOG | 1024 | thrpt | 10 | 0.041 | ops/ms | 18.265 | 18.291 | 0.999 | 18.192 | 18.21 | 0.999 |
Double64Vector.LOG10 | 1024 | thrpt | 10 | 0.003 | ops/ms | 16.502 | 16.441 | 1.004 | 16.393 | 16.433 | 0.998 |
Double64Vector.LOG1P | 1024 | thrpt | 10 | 0.009 | ops/ms | 16.815 | 16.862 | 0.997 | 16.792 | 16.833 | 0.998 |
Double64Vector.POW | 1024 | thrpt | 10 | 0.012 | ops/ms | 11.814 | 11.82 | 0.999 | 11.865 | 11.877 | 0.999 |
Double64Vector.SIN | 1024 | thrpt | 10 | 0.005 | ops/ms | 20.557 | 20.605 | 0.998 | 20.57 | 20.26 | 1.015 |
Double64Vector.SINH | 1024 | thrpt | 10 | 0.074 | ops/ms | 23.133 | 23.23 | 0.996 | 23.048 | 23.069 | 0.999 |
Double64Vector.TAN | 1024 | thrpt | 10 | 0.009 | ops/ms | 14.504 | 14.553 | 0.997 | 14.456 | 14.518 | 0.996 |
Double64Vector.TANH | 1024 | thrpt | 10 | 0.12 | ops/ms | 31.304 | 31.226 | 1.002 | 31.4 | 31.267 | 1.004 |
DoubleMaxVector.ACOS | 1024 | thrpt | 10 | 0.146 | ops/ms | 179.388 | 112.342 | 1.597 | 118.005 | 67.768 | 1.741 |
DoubleMaxVector.ASIN | 1024 | thrpt | 10 | 0.169 | ops/ms | 212.342 | 114.107 | 1.861 | 145.676 | 68.143 | 2.138 |
DoubleMaxVector.ATAN | 1024 | thrpt | 10 | 0.011 | ops/ms | 120.925 | 55.823 | 2.166 | 86.676 | 43.156 | 2.008 |
DoubleMaxVector.ATAN2 | 1024 | thrpt | 10 | 0.006 | ops/ms | 98.345 | 33.604 | 2.927 | 60.45 | 26.383 | 2.291 |
DoubleMaxVector.CBRT | 1024 | thrpt | 10 | 0.006 | ops/ms | 88.947 | 43.447 | 2.047 | 52.648 | 30.665 | 1.717 |
DoubleMaxVector.COS | 1024 | thrpt | 10 | 0.023 | ops/ms | 119.164 | 65.718 | 1.813 | 71.619 | 47.145 | 1.519 |
DoubleMaxVector.COSH | 1024 | thrpt | 10 | 0.005 | ops/ms | 124.342 | 75.967 | 1.637 | 82.447 | 54.084 | 1.524 |
DoubleMaxVector.EXP | 1024 | thrpt | 10 | 0.042 | ops/ms | 390.767 | 87.918 | 4.445 | 216.207 | 58.342 | 3.706 |
DoubleMaxVector.EXPM1 | 1024 | thrpt | 10 | 0.018 | ops/ms | 121.79 | 66.387 | 1.835 | 95.935 | 48.204 | 1.99 |
DoubleMaxVector.HYPOT | 1024 | thrpt | 10 | 0.011 | ops/ms | 138.549 | 61.183 | 2.265 | 87.859 | 37.39 | 2.35 |
DoubleMaxVector.LOG | 1024 | thrpt | 10 | 0.034 | ops/ms | 164.687 | 55.44 | 2.971 | 98.446 | 41.873 | 2.351 |
DoubleMaxVector.LOG10 | 1024 | thrpt | 10 | 0.026 | ops/ms | 144.388 | 44.94 | 3.213 | 84.062 | 36.252 | 2.319 |
DoubleMaxVector.LOG1P | 1024 | thrpt | 10 | 0.218 | ops/ms | 151.047 | 46.394 | 3.256 | 86.671 | 36.248 | 2.391 |
DoubleMaxVector.POW | 1024 | thrpt | 10 | 0.004 | ops/ms | 53.241 | 25.251 | 2.108 | 34.371 | 21.58 | 1.593 |
DoubleMaxVector.SIN | 1024 | thrpt | 10 | 0.003 | ops/ms | 130.708 | 65.451 | 1.997 | 83.012 | 47.547 | 1.746 |
DoubleMaxVector.SINH | 1024 | thrpt | 10 | 0.007 | ops/ms | 120.654 | 75.693 | 1.594 | 80.603 | 53.586 | 1.504 |
DoubleMaxVector.TAN | 1024 | thrpt | 10 | 0.062 | ops/ms | 80.045 | 33.268 | 2.406 | 56.48 | 27.723 | 2.037 |
DoubleMaxVector.TANH | 1024 | thrpt | 10 | 0.99 | ops/ms | 154.334 | 153.197 | 1.007 | 65.401 | 82.937 | 0.789 |
DoubleScalar.ACOS | 1024 | thrpt | 10 | 0.06 | ops/ms | 342.452 | 342.471 | 1 | 342.471 | 42.461 | 8.066 |
DoubleScalar.ASIN | 1024 | thrpt | 10 | 0.09 | ops/ms | 353.739 | 354.47 | 0.998 | 352.211 | 54.513 | 6.461 |
DoubleScalar.ATAN | 1024 | thrpt | 10 | 0.043 | ops/ms | 100.797 | 101.069 | 0.997 | 101.089 | 1.086 | 93.084 |
DoubleScalar.ATAN2 | 1024 | thrpt | 10 | 0.025 | ops/ms | 62.29 | 62.283 | 1 | 62.218 | 62.227 | 1 |
DoubleScalar.CBRT | 1024 | thrpt | 10 | 0.014 | ops/ms | 73.922 | 73.929 | 1 | 73.906 | 73.916 | 1 |
DoubleScalar.COS | 1024 | thrpt | 10 | 0.204 | ops/ms | 117.948 | 117.806 | 1.001 | 117.856 | 17.763 | 6.635 |
DoubleScalar.COSH | 1024 | thrpt | 10 | 0.016 | ops/ms | 141.113 | 141.083 | 1 | 141.749 | 40.659 | 3.486 |
DoubleScalar.EXP | 1024 | thrpt | 10 | 0.008 | ops/ms | 189.453 | 188.923 | 1.003 | 189.555 | 89.348 | 2.122 |
DoubleScalar.EXPM1 | 1024 | thrpt | 10 | 0.051 | ops/ms | 133.617 | 133.549 | 1.001 | 133.224 | 33.61 | 3.964 |
DoubleScalar.HYPOT | 1024 | thrpt | 10 | 3.613 | ops/ms | 180.215 | 175.912 | 1.024 | 176.083 | 81.916 | 2.15 |
DoubleScalar.LOG | 1024 | thrpt | 10 | 0.013 | ops/ms | 101.791 | 101.801 | 1 | 101.779 | 1.786 | 56.987 |
DoubleScalar.LOG10 | 1024 | thrpt | 10 | 0.099 | ops/ms | 76.849 | 76.847 | 1 | 76.807 | 76.757 | 1.001 |
DoubleScalar.LOG1P | 1024 | thrpt | 10 | 0.081 | ops/ms | 79.261 | 79.298 | 1 | 79.268 | 79.281 | 1 |
DoubleScalar.POW | 1024 | thrpt | 10 | 0.002 | ops/ms | 31.915 | 31.925 | 1 | 31.919 | 31.92 | 1 |
DoubleScalar.SIN | 1024 | thrpt | 10 | 0.167 | ops/ms | 118.087 | 117.722 | 1.003 | 118.292 | 18.243 | 6.484 |
DoubleScalar.SINH | 1024 | thrpt | 10 | 0.012 | ops/ms | 143.901 | 143.803 | 1.001 | 144.228 | 43.922 | 3.284 |
DoubleScalar.TAN | 1024 | thrpt | 10 | 0.047 | ops/ms | 46.513 | 46.584 | 0.998 | 46.503 | 46.778 | 0.994 |
DoubleScalar.TANH | 1024 | thrpt | 10 | 0.204 | ops/ms | 552.603 | 561.965 | 0.983 | 561.941 | 61.802 | 9.093 |
Backup of previous test summary
NOTE:
-
Src
means implementation in this pr, i.e. without depenency on external sleef. -
Disabled
means disable intrinsics by-XX:-UseVectorStubs
-
system_sleef
means implementation in previous pr 18294, i.e. build and run jdk with depenency on external sleef.
Basically, the perf data below shows that
- this implementation has better performance than previous version in pr 18294,
- and both sleef versions has much better performance compared with non-sleef version.
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF (Enhancement - P4)
Contributors
- Xiaohong Gong
<[email protected]>
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605
$ git checkout pull/18605
Update a local copy of the PR:
$ git checkout pull/18605
$ git pull https://git.openjdk.org/jdk.git pull/18605/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18605
View PR using the GUI difftool:
$ git pr show -t 18605
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18605.diff