8330021: AArch64: Add backend support for FP16 add operation
This commit [1] adds initial support for FP16 operations and adds backend support for FP16 add operation for X86. This task adds backend support for scalar and vector FP16 add operation on aarch64.
[1] https://github.com/openjdk/valhalla/commit/f03fb4e4ee4d59ed692d0c26ddce260511f544e7#diff-a799ce8da7f3062bb3699beb65aae504840c649942032e325c2a50f88a2869ad
Progress
- [x] Change must not contain extraneous whitespace
Issue
- JDK-8330021: AArch64: Add backend support for FP16 add operation (Enhancement - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/valhalla.git pull/1096/head:pull/1096
$ git checkout pull/1096
Update a local copy of the PR:
$ git checkout pull/1096
$ git pull https://git.openjdk.org/valhalla.git pull/1096/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 1096
View PR using the GUI difftool:
$ git pr show -t 1096
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/valhalla/pull/1096.diff
Webrev
:wave: Welcome back bkilambi! A progress list of the required criteria for merging this PR into lworld+fp16 will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@Bhavana-Kilambi This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8330021: AArch64: Add backend support for FP16 add operation
Reviewed-by: jbhateja
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been no new commits pushed to the lworld+fp16 branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.
As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jatin-bhateja) but any other Committer may sponsor as well.
➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).
Webrevs
- 05: Full - Incremental (bd0e41b4)
- 04: Full - Incremental (6ec5a084)
- 03: Full (c867114e)
- 02: Full (9fd5c5ea)
- 01: Full - Incremental (afdd81c6)
- 00: Full (1d09ba44)
Can I please ask for some reviews for this PR? Thank you in advance!
Hi @jatin-bhateja @TobiHartmann Can I get some reviews please? Atleast for the c2 specific part. Thank you !
@Bhavana-Kilambi this pull request can not be integrated into lworld+fp16 due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:
git checkout JDK-8330021
git fetch https://git.openjdk.org/valhalla.git lworld+fp16
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge lworld+fp16"
git push
Hello @jatin-bhateja , I noticed that the vector test - test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorSum.java fails on aarch64/x86 machines that support the Float16 feature. Have you noticed this on any of your avx512_fp16 supporting machines too? I suspect this failure does not show up on github actions for this PR or others because they might be running the tests on non-fp16 supporting hardware.
I digged a bit deeper and found out that the Float16 add operation was not getting unrolled in the first place because of which it doesn't get vectorized eventually. I have an internal implementation for other binary FP16 operations (like subtract, multiply, divide, min and max) and none of these operations get vectorized either. After some debugging, I noticed that the inner loop body -> _body at this point of code execution - https://github.com/openjdk/valhalla/blob/404a4fae6fc099cd93cc73d57e29b08b5418adb9/src/hotspot/share/opto/loopTransform.cpp#L1069 contains 188 nodes for Half float add compared to 29 nodes for Float32 add and 31 nodes for FP16 add before the merge on June 9th. I think due to so many nodes in the loop, the unrolling policy decides not to unroll it.
I printed the extra nodes that the loop (for FP16 add operation) contains and I can see nodes like - DecodeNKlass, DecodeN, LoadNKlass, LoadN, CmpP, StoreP, StoreCM, MergeMem, EncodeP, CastPP and other pointer related IR nodes and maybe some nodes related to decoding compressed class pointers. Although none of these exist in the FP32 code or the FP16 code before the merge. Do you have any idea about this failure or the extra nodes in the loop?
Hello @jatin-bhateja , I noticed that the vector test -
test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorSum.javafails on aarch64/x86 machines that support the Float16 feature. Have you noticed this on any of youravx512_fp16supporting machines too? I suspect this failure does not show up on github actions for this PR or others because they might be running the tests on non-fp16 supporting hardware.I digged a bit deeper and found out that the Float16
addoperation was not getting unrolled in the first place because of which it doesn't get vectorized eventually. I have an internal implementation for other binary FP16 operations (like subtract, multiply, divide, min and max) and none of these operations get vectorized either. After some debugging, I noticed that the inner loop body ->_bodyat this point of code execution -https://github.com/openjdk/valhalla/blob/404a4fae6fc099cd93cc73d57e29b08b5418adb9/src/hotspot/share/opto/loopTransform.cpp#L1069contains 188 nodes for Half float add compared to 29 nodes for Float32 add and 31 nodes for FP16 add before the merge on June 9th. I think due to so many nodes in the loop, the unrolling policy decides not to unroll it. I printed the extra nodes that the loop (for FP16 add operation) contains and I can see nodes like -DecodeNKlass,DecodeN,LoadNKlass,LoadN,CmpP,StoreP,StoreCM,MergeMem,EncodeP,CastPPand other pointer related IR nodes and maybe some nodes related to decoding compressed class pointers. Although none of these exist in the FP32 code or the FP16 code before the merge. Do you have any idea about this failure or the extra nodes in the loop?
Hi @Bhavana-Kilambi ,
Flat array layout is currently broken, it used to work when Float16 was a primitive value class, please refer following link for more details. https://github.com/openjdk/valhalla/pull/1117#issuecomment-2161269886
I have already created a RFE for it https://bugs.openjdk.org/browse/JDK-8333852 and working on the same.
Best Regards, Jatin
Thanks @jatin-bhateja. I seem to have missed this RFE. I get it now.
Hi @jatin-bhateja , I have rebased my patch to include your latest merge regarding flat array layout for FP16 operations. Could you please help review this patch? I can then post the patch that implements support for some more FP16 binary operations which is stalled until this patch is merged. Thanks!
Non target specific part looks ok to me.
Hi @jatin-bhateja , I have uploaded a new patch addressing your comments. Can you please review? Thanks!
Hi @jatin-bhateja , I have addressed your review comments. Please take a look. Thanks!
/integrate
@jatin-bhateja thank you. Hope it's ok for me to integrate? The aarch64 part should be ok as it's gone through internal code review in Arm anyway.
@Bhavana-Kilambi Your change (at version bd0e41b46049b10f5cb812c54162268e3e64698d) is now ready to be sponsored by a Committer.
/sponsor
Going to push as commit 9005a9a309e615780e87e49673ca9f9ea4831137.
@jatin-bhateja @Bhavana-Kilambi Pushed as commit 9005a9a309e615780e87e49673ca9f9ea4831137.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.