panama-vector 8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow

trafficstars

Hi All,

This patch extends VectorAPI inline expanders to infer Float16 vector IR based on the newly passed operType argument. We intend to leverage the existing IR and backend implementation of auto-vectorized Float16 operations. Various HalffloatVector operators, namely ADD, SUB, MUL, DIV, MAX, MIN, and FMA, now emit FP16 ISA on x86 targets supporting AVX512-FP16 feature and AArch64 SVE targets.

Best Regards, Jatin

Progress

[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue
[ ] Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

JDK-8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/panama-vector.git pull/231/head:pull/231
$ git checkout pull/231

Update a local copy of the PR:
$ git checkout pull/231
$ git pull https://git.openjdk.org/panama-vector.git pull/231/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 231

View PR using the GUI difftool:
$ git pr show -t 231

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/panama-vector/pull/231.diff

Using Webrev

Link to Webrev Comment

Aug 22 '25 17:08 jatin-bhateja

:wave: Welcome back jbhateja! A progress list of the required criteria for merging this PR into vectorIntrinsics will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Aug 22 '25 17:08 bridgekeeper[bot]

❗ This change is not yet ready to be integrated. See the Progress checklist in the description for automated requirements.

Aug 22 '25 17:08 openjdk[bot]

Performance of the FMA benchmark on Intel Xeon Emerald Rapids : INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.30GHz

Aug 25 '25 13:08 jatin-bhateja

⚠️ @jatin-bhateja This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

Aug 29 '25 12:08 openjdk[bot]

Webrevs

Aug 29 '25 12:08 mlbridge[bot]

What is remaining?

Functional validation
Performance validation
New IR framework-based tests.
Microbenchmark for FP16-based dotproduct.

Aug 29 '25 12:08 jatin-bhateja

@jatin-bhateja This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Sep 30 '25 22:09 bridgekeeper[bot]

@jatin-bhateja Unknown command keeplive - for a list of valid commands use /help.

Oct 01 '25 22:10 openjdk[bot]

/keepalive

Oct 01 '25 22:10 jatin-bhateja

@jatin-bhateja The pull request is being re-evaluated and the inactivity timeout has been reset.

Oct 01 '25 22:10 openjdk[bot]

Performance of JMH micros System: Model name: INTEL(R) XEON(R) PLATINUM 8581C CPU @ 2.10GHz

Baseline:
Benchmark                      (size)   Mode  Cnt      Score   Error   Units
Halffloat256Vector.ABS           1024  thrpt    2    366.995          ops/ms
Halffloat256Vector.ABSMasked     1024  thrpt    2    345.584          ops/ms
Halffloat256Vector.ACOS          1024  thrpt    2     61.402          ops/ms
Halffloat256Vector.ADD           1024  thrpt    2    259.029          ops/ms
Halffloat256Vector.ADDMasked     1024  thrpt    2    251.257          ops/ms
Halffloat256Vector.ASIN          1024  thrpt    2     61.191          ops/ms
Halffloat256Vector.ATAN          1024  thrpt    2     40.815          ops/ms
Halffloat256Vector.ATAN2         1024  thrpt    2     28.224          ops/ms
Halffloat256Vector.CBRT          1024  thrpt    2     43.547          ops/ms
Halffloat256Vector.COS           1024  thrpt    2     37.414          ops/ms
Halffloat256Vector.COSH          1024  thrpt    2     46.365          ops/ms
Halffloat256Vector.DIV           1024  thrpt    2    221.924          ops/ms
Halffloat256Vector.DIVMasked     1024  thrpt    2    240.560          ops/ms
Halffloat256Vector.EXP           1024  thrpt    2     52.344          ops/ms
Halffloat256Vector.EXPM1         1024  thrpt    2     48.346          ops/ms
Halffloat256Vector.FMA           1024  thrpt    2    206.324          ops/ms
Halffloat256Vector.FMAMasked     1024  thrpt    2    184.678          ops/ms
Halffloat256Vector.HYPOT         1024  thrpt    2     34.096          ops/ms
Halffloat256Vector.LOG           1024  thrpt    2     40.300          ops/ms
Halffloat256Vector.LOG10         1024  thrpt    2     38.886          ops/ms
Halffloat256Vector.LOG1P         1024  thrpt    2     36.438          ops/ms
Halffloat256Vector.MAX           1024  thrpt    2    266.337          ops/ms
Halffloat256Vector.MAXMasked     1024  thrpt    2    245.518          ops/ms
Halffloat256Vector.MIN           1024  thrpt    2    268.963          ops/ms
Halffloat256Vector.MINMasked     1024  thrpt    2    243.136          ops/ms
Halffloat256Vector.MUL           1024  thrpt    2    264.127          ops/ms
Halffloat256Vector.MULMasked     1024  thrpt    2    251.600          ops/ms
Halffloat256Vector.NEG           1024  thrpt    2    365.486          ops/ms
Halffloat256Vector.NEGMasked     1024  thrpt    2    357.070          ops/ms
Halffloat256Vector.POW           1024  thrpt    2     26.809          ops/ms
Halffloat256Vector.SIN           1024  thrpt    2     34.555          ops/ms
Halffloat256Vector.SINH          1024  thrpt    2     53.779          ops/ms
Halffloat256Vector.SQRT          1024  thrpt    2    130.811          ops/ms
Halffloat256Vector.SQRTMasked    1024  thrpt    2    192.628          ops/ms
Halffloat256Vector.SUB           1024  thrpt    2    262.521          ops/ms
Halffloat256Vector.SUBMasked     1024  thrpt    2    254.578          ops/ms
Halffloat256Vector.TAN           1024  thrpt    2     30.002          ops/ms
Halffloat256Vector.TANH          1024  thrpt    2     55.562          ops/ms
Halffloat256Vector.blend         1024  thrpt    2  28002.356          ops/ms

Withopt:-
Benchmark                      (size)   Mode  Cnt      Score   Error   Units
Halffloat256Vector.ABS           1024  thrpt    2  24048.638          ops/ms
Halffloat256Vector.ABSMasked     1024  thrpt    2  45085.707          ops/ms
Halffloat256Vector.ACOS          1024  thrpt    2     56.116          ops/ms
Halffloat256Vector.ADD           1024  thrpt    2  19623.250          ops/ms
Halffloat256Vector.ADDMasked     1024  thrpt    2  27462.171          ops/ms
Halffloat256Vector.ASIN          1024  thrpt    2     62.081          ops/ms
Halffloat256Vector.ATAN          1024  thrpt    2     41.352          ops/ms
Halffloat256Vector.ATAN2         1024  thrpt    2     29.173          ops/ms
Halffloat256Vector.CBRT          1024  thrpt    2     39.926          ops/ms
Halffloat256Vector.COS           1024  thrpt    2     37.151          ops/ms
Halffloat256Vector.COSH          1024  thrpt    2     48.309          ops/ms
Halffloat256Vector.DIV           1024  thrpt    2   2805.701          ops/ms
Halffloat256Vector.DIVMasked     1024  thrpt    2   2795.544          ops/ms
Halffloat256Vector.EXP           1024  thrpt    2     55.055          ops/ms
Halffloat256Vector.EXPM1         1024  thrpt    2     50.483          ops/ms
Halffloat256Vector.FMA           1024  thrpt    2  23280.064          ops/ms
Halffloat256Vector.FMAMasked     1024  thrpt    2  21828.932          ops/ms
Halffloat256Vector.HYPOT         1024  thrpt    2     34.266          ops/ms
Halffloat256Vector.LOG           1024  thrpt    2     42.158          ops/ms
Halffloat256Vector.LOG10         1024  thrpt    2     41.335          ops/ms
Halffloat256Vector.LOG1P         1024  thrpt    2     36.291          ops/ms
Halffloat256Vector.MAX           1024  thrpt    2  14960.348          ops/ms
Halffloat256Vector.MAXMasked     1024  thrpt    2  12585.642          ops/ms
Halffloat256Vector.MIN           1024  thrpt    2  14662.769          ops/ms
Halffloat256Vector.MINMasked     1024  thrpt    2  12327.769          ops/ms
Halffloat256Vector.MUL           1024  thrpt    2  27156.965          ops/ms
Halffloat256Vector.MULMasked     1024  thrpt    2  21349.555          ops/ms
Halffloat256Vector.NEG           1024  thrpt    2  24093.711          ops/ms
Halffloat256Vector.NEGMasked     1024  thrpt    2  26889.264          ops/ms
Halffloat256Vector.POW           1024  thrpt    2     27.028          ops/ms
Halffloat256Vector.SIN           1024  thrpt    2     34.280          ops/ms
Halffloat256Vector.SINH          1024  thrpt    2     55.049          ops/ms
Halffloat256Vector.SQRT          1024  thrpt    2   2491.596          ops/ms
Halffloat256Vector.SQRTMasked    1024  thrpt    2   2493.591          ops/ms
Halffloat256Vector.SUB           1024  thrpt    2  29664.499          ops/ms
Halffloat256Vector.SUBMasked     1024  thrpt    2  25384.305          ops/ms
Halffloat256Vector.TAN           1024  thrpt    2     29.754          ops/ms
Halffloat256Vector.TANH          1024  thrpt    2     55.933          ops/ms
Halffloat256Vector.blend         1024  thrpt    2  22681.727          ops/ms

What is remaining?

Functional validation Through performance validation New IR framework-based tests. Microbenchmark for FP16-based dotproduct.

Oct 02 '25 04:10 jatin-bhateja

Integrating this PR, the remaining work will be part of JDK-mainline PR pull/28002

Nov 07 '25 10:11 jatin-bhateja

@jatin-bhateja This pull request has not yet been marked as ready for integration.

Nov 07 '25 10:11 openjdk[bot]

panama-vector panama-vector copied to clipboard

8365967: C2 compiler support for HalffloatVector operations supported by auto-vectorization flow

Progress

Issue

Reviewing

Webrevs

panama-vector
panama-vector copied to clipboard