jdk
jdk copied to clipboard
8325155: C2 SuperWord: remove alignment boundaries
I have tried for a very long time to get rid of all the alignment(n)
code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it.
I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code.
This is what I did in this PR:
- Removal of
_node_info
: used to have many fields, which I refactored out to theVLoopAnalyzer
modules.alignment
is the last component, which I now remove. - Changed the implementation of
SuperWord::find_adjacent_refs
, nowSuperWord::find_adjacent_memop_pairs
, completely:- It used to be an algorithm that would scan over all
memops
repeatedly, try to find somemem_ref
and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse. - I now add all
memops
into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost toO(n log n)
for sort, and a linear scan for finding adjacent memops.
- It used to be an algorithm that would scan over all
- I removed the "alignment boundaries" created in
SuperWord::memory_alignment
byint off_rem = offset % vw;
.- This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets
31, 32
, which are adjacent in theory, but if we have avw = 32
, then the modulo-offsets are31, 0
, and they are not detected as adjacent). - These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code.
- This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets
- The
alignment
used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node.- This was done by giving all nodes an
alignment
, even the non-memop nodes. Thisalignment
was then scaled up and down at type casts (e.g. int0, 4, 8, 12
-> long0, 8, 16, 24
). If the output-size of the def-node did not match the input-size of the use-node, then thealignment
would not match up, and we would not pack. - This is why we used to have checks like
alignment(s1) + data_size(s1) == alignment(s2)
ands2_align == align + data_size(s1)
, and why we didset_alignment(s2, align + data_size(s1));
insideSuperWord::set_alignment(Node* s1, Node* s2, int align)
. - I decided to NOT check if use/def type sizes match during packing, but only much later in
SuperWord::profitable
(bad name, it has always been more about checking consistency than profitability, but I will rename that in a Future RFE). The relevant code is inSuperWord::is_velt_basic_type_compatible_use_def
.
- This was done by giving all nodes an
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8325155: C2 SuperWord: remove alignment boundaries (Enhancement - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822
$ git checkout pull/18822
Update a local copy of the PR:
$ git checkout pull/18822
$ git pull https://git.openjdk.org/jdk.git pull/18822/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18822
View PR using the GUI difftool:
$ git pr show -t 18822
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18822.diff
Webrev
:wave: Welcome back epeter! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@eme64 This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8325155: C2 SuperWord: remove alignment boundaries
Reviewed-by: chagedorn, kvn
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 165 new commits pushed to the master
branch:
- 2a37764e7428d579a3080e62681f1c9c9f816c1e: 8333743: Change .jcheck/conf branches property to match valid branches
- 75dc2f8518d0adea30f7065d6732b807c0220756: 8330182: Start of release updates for JDK 24
- 054362abe040938b87eb1a1cab8a0a94540e0667: 8332550: [macos] Voice Over: java.awt.IllegalComponentStateException: component must be showing on the screen to determine its location
- 9b436d048ec92f74ec6812ae20fde21751927d4b: 8333674: Disable CollectorPolicy.young_min_ergo_vm for PPC64
- 487c4771818999749bfd507ab85777795bba0832: 8333647: C2 SuperWord: some additional PopulateIndex tests
- d02cb742f79e88c6438ca58a6357fe432fb286cb: 8333270: HandlersOnComplexResetUpdate and HandlersOnComplexUpdate tests fail with "Unexpected reference" if timeoutFactor is less than 1/3
- 02f240415cbda5f67a91af50d5974fb001104170: 8333560: -Xlint:restricted does not work with --release
- 606df441410a69034b4c113e85ce21937d1a0808: 8332670: C1 clone intrinsic needs memory barriers
- 33fd6ae98638d2a4b33d18cc4acee4f0daaa9b35: 8333622: ubsan: relocInfo_x86.cpp:101:56: runtime error: pointer index expression with base (-1) overflowed
- 8de5d2014a87d58d389eb8400f619d1b1fa3abe7: 8332865: ubsan: os::attempt_reserve_memory_between reports overflow
- ... and 155 more: https://git.openjdk.org/jdk/compare/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
➡️ To integrate this PR with the above commit message to the master
branch, type /integrate in a new comment.
@eme64 The following label will be automatically applied to this pull request:
-
hotspot-compiler
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Webrevs
- 05: Full - Incremental (d18219b5)
- 04: Full - Incremental (cd65ca05)
- 03: Full (41fc1eb3)
- 02: Full - Incremental (d48bafa9)
- 01: Full (82c9a77a)
- 00: Full (69396ac8)
⚠️ @eme64 This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch>
where <project>
is the name of another project in the OpenJDK organization (for example Merge jdk:master
).
Thanks @chhagedorn for the review! I think I addressed all your points. Except for this:
You should mention here that this method finally adds a new pair to the _pairset. On a separate note, find_adjacent_memop_pairs_in_group() suggests that we find something but we actually "find and add" something without returning anything from the method. Should we make this more clear in the method name?
I have not yet found a better name. I think find
is ok here. I guess we could have it be find_and_add
but this seems unnecessarily long to me.
FYI: I ran performance benchmarking, and there was no significant difference.
Thank you for running performance testing.
@vnkozlov thanks for the review! I will integrate as soon as the JDK24 fork happens ;)
Thanks @chhagedorn @vnkozlov for the reviews! /integrate
Going to push as commit 944aeb81b16e3e7a3019cafdefe67b797fa6be96.
Since your change was applied there have been 167 commits pushed to the master
branch:
- d8af58941b5dedb9774c0971895c4924e57ac28b: 8026127: Deflater/Inflater documentation incomplete/misleading
- 6238bc8da2abe7a1f0cdd98c0af01e9ba1869ec3: 8333456: CompactNumberFormat integer parsing fails when string has no suffix
- 2a37764e7428d579a3080e62681f1c9c9f816c1e: 8333743: Change .jcheck/conf branches property to match valid branches
- 75dc2f8518d0adea30f7065d6732b807c0220756: 8330182: Start of release updates for JDK 24
- 054362abe040938b87eb1a1cab8a0a94540e0667: 8332550: [macos] Voice Over: java.awt.IllegalComponentStateException: component must be showing on the screen to determine its location
- 9b436d048ec92f74ec6812ae20fde21751927d4b: 8333674: Disable CollectorPolicy.young_min_ergo_vm for PPC64
- 487c4771818999749bfd507ab85777795bba0832: 8333647: C2 SuperWord: some additional PopulateIndex tests
- d02cb742f79e88c6438ca58a6357fe432fb286cb: 8333270: HandlersOnComplexResetUpdate and HandlersOnComplexUpdate tests fail with "Unexpected reference" if timeoutFactor is less than 1/3
- 02f240415cbda5f67a91af50d5974fb001104170: 8333560: -Xlint:restricted does not work with --release
- 606df441410a69034b4c113e85ce21937d1a0808: 8332670: C1 clone intrinsic needs memory barriers
- ... and 157 more: https://git.openjdk.org/jdk/compare/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e...master
Your commit was automatically rebased without conflicts.
@eme64 Pushed as commit 944aeb81b16e3e7a3019cafdefe67b797fa6be96.
:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.