valhalla 8330021: AArch64: Add backend support for FP16 add operation

This commit [1] adds initial support for FP16 operations and adds backend support for FP16 add operation for X86. This task adds backend support for scalar and vector FP16 add operation on aarch64.

[1] https://github.com/openjdk/valhalla/commit/f03fb4e4ee4d59ed692d0c26ddce260511f544e7#diff-a799ce8da7f3062bb3699beb65aae504840c649942032e325c2a50f88a2869ad

Progress

[x] Change must not contain extraneous whitespace

Issue

JDK-8330021: AArch64: Add backend support for FP16 add operation (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1096/head:pull/1096
$ git checkout pull/1096

Update a local copy of the PR:
$ git checkout pull/1096
$ git pull https://git.openjdk.org/valhalla.git pull/1096/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 1096

View PR using the GUI difftool:
$ git pr show -t 1096

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1096.diff

Webrev

Link to Webrev Comment

May 06 '24 18:05 Bhavana-Kilambi

:wave: Welcome back bkilambi! A progress list of the required criteria for merging this PR into lworld+fp16 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

May 06 '24 18:05 bridgekeeper[bot]

@Bhavana-Kilambi This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8330021: AArch64: Add backend support for FP16 add operation

Reviewed-by: jbhateja

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the lworld+fp16 branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

May 06 '24 18:05 openjdk[bot]

Webrevs

May 07 '24 08:05 mlbridge[bot]

Can I please ask for some reviews for this PR? Thank you in advance!

May 09 '24 08:05 Bhavana-Kilambi

Hi @jatin-bhateja @TobiHartmann Can I get some reviews please? Atleast for the c2 specific part. Thank you !

May 16 '24 09:05 Bhavana-Kilambi

@Bhavana-Kilambi this pull request can not be integrated into lworld+fp16 due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8330021
git fetch https://git.openjdk.org/valhalla.git lworld+fp16
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge lworld+fp16"
git push

Jun 09 '24 19:06 openjdk[bot]

Hello @jatin-bhateja , I noticed that the vector test - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorSum.java fails on aarch64/x86 machines that support the Float16 feature. Have you noticed this on any of your avx512_fp16 supporting machines too? I suspect this failure does not show up on github actions for this PR or others because they might be running the tests on non-fp16 supporting hardware.

I digged a bit deeper and found out that the Float16 add operation was not getting unrolled in the first place because of which it doesn't get vectorized eventually. I have an internal implementation for other binary FP16 operations (like subtract, multiply, divide, min and max) and none of these operations get vectorized either. After some debugging, I noticed that the inner loop body -> _body at this point of code execution - https://github.com/openjdk/valhalla/blob/404a4fae6fc099cd93cc73d57e29b08b5418adb9/src/hotspot/share/opto/loopTransform.cpp#L1069 contains 188 nodes for Half float add compared to 29 nodes for Float32 add and 31 nodes for FP16 add before the merge on June 9th. I think due to so many nodes in the loop, the unrolling policy decides not to unroll it. I printed the extra nodes that the loop (for FP16 add operation) contains and I can see nodes like - DecodeNKlass, DecodeN, LoadNKlass, LoadN, CmpP, StoreP, StoreCM, MergeMem, EncodeP, CastPP and other pointer related IR nodes and maybe some nodes related to decoding compressed class pointers. Although none of these exist in the FP32 code or the FP16 code before the merge. Do you have any idea about this failure or the extra nodes in the loop?

Jun 20 '24 13:06 Bhavana-Kilambi

Hello @jatin-bhateja , I noticed that the vector test - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorSum.java fails on aarch64/x86 machines that support the Float16 feature. Have you noticed this on any of your avx512_fp16 supporting machines too? I suspect this failure does not show up on github actions for this PR or others because they might be running the tests on non-fp16 supporting hardware.

I digged a bit deeper and found out that the Float16 add operation was not getting unrolled in the first place because of which it doesn't get vectorized eventually. I have an internal implementation for other binary FP16 operations (like subtract, multiply, divide, min and max) and none of these operations get vectorized either. After some debugging, I noticed that the inner loop body -> _body at this point of code execution - https://github.com/openjdk/valhalla/blob/404a4fae6fc099cd93cc73d57e29b08b5418adb9/src/hotspot/share/opto/loopTransform.cpp#L1069 contains 188 nodes for Half float add compared to 29 nodes for Float32 add and 31 nodes for FP16 add before the merge on June 9th. I think due to so many nodes in the loop, the unrolling policy decides not to unroll it. I printed the extra nodes that the loop (for FP16 add operation) contains and I can see nodes like - DecodeNKlass, DecodeN, LoadNKlass, LoadN, CmpP, StoreP, StoreCM, MergeMem, EncodeP, CastPP and other pointer related IR nodes and maybe some nodes related to decoding compressed class pointers. Although none of these exist in the FP32 code or the FP16 code before the merge. Do you have any idea about this failure or the extra nodes in the loop?

Hi @Bhavana-Kilambi ,

Flat array layout is currently broken, it used to work when Float16 was a primitive value class, please refer following link for more details. https://github.com/openjdk/valhalla/pull/1117#issuecomment-2161269886

I have already created a RFE for it https://bugs.openjdk.org/browse/JDK-8333852 and working on the same.

Best Regards, Jatin

Jun 21 '24 06:06 jatin-bhateja

Thanks @jatin-bhateja. I seem to have missed this RFE. I get it now.

Jun 21 '24 08:06 Bhavana-Kilambi

Hi @jatin-bhateja , I have rebased my patch to include your latest merge regarding flat array layout for FP16 operations. Could you please help review this patch? I can then post the patch that implements support for some more FP16 binary operations which is stalled until this patch is merged. Thanks!

Jul 04 '24 15:07 Bhavana-Kilambi

Non target specific part looks ok to me.

Jul 05 '24 13:07 jatin-bhateja

Hi @jatin-bhateja , I have uploaded a new patch addressing your comments. Can you please review? Thanks!

Jul 10 '24 08:07 Bhavana-Kilambi

Hi @jatin-bhateja , I have addressed your review comments. Please take a look. Thanks!

Jul 11 '24 08:07 Bhavana-Kilambi

/integrate

Jul 12 '24 08:07 Bhavana-Kilambi

@jatin-bhateja thank you. Hope it's ok for me to integrate? The aarch64 part should be ok as it's gone through internal code review in Arm anyway.

Jul 12 '24 08:07 Bhavana-Kilambi

@Bhavana-Kilambi Your change (at version bd0e41b46049b10f5cb812c54162268e3e64698d) is now ready to be sponsored by a Committer.

Jul 12 '24 08:07 openjdk[bot]

/sponsor

Jul 12 '24 17:07 jatin-bhateja

Going to push as commit 9005a9a309e615780e87e49673ca9f9ea4831137.

Jul 12 '24 17:07 openjdk[bot]

@jatin-bhateja @Bhavana-Kilambi Pushed as commit 9005a9a309e615780e87e49673ca9f9ea4831137.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Jul 12 '24 17:07 openjdk[bot]