jdk 8325155: C2 SuperWord: remove alignment boundaries

I have tried for a very long time to get rid of all the alignment(n) code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it.

I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code.

This is what I did in this PR:

Removal of _node_info: used to have many fields, which I refactored out to the VLoopAnalyzer modules. alignment is the last component, which I now remove.
Changed the implementation of SuperWord::find_adjacent_refs, now SuperWord::find_adjacent_memop_pairs, completely:
- It used to be an algorithm that would scan over all memops repeatedly, try to find some mem_ref and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse.
- I now add all memops into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to O(n log n) for sort, and a linear scan for finding adjacent memops.
I removed the "alignment boundaries" created in SuperWord::memory_alignment by int off_rem = offset % vw;.
- This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets 31, 32, which are adjacent in theory, but if we have a vw = 32, then the modulo-offsets are 31, 0, and they are not detected as adjacent).
- These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code.
The alignment used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node.
- This was done by giving all nodes an alignment, even the non-memop nodes. This alignment was then scaled up and down at type casts (e.g. int 0, 4, 8, 12 -> long 0, 8, 16, 24). If the output-size of the def-node did not match the input-size of the use-node, then the alignment would not match up, and we would not pack.
- This is why we used to have checks like alignment(s1) + data_size(s1) == alignment(s2) and s2_align == align + data_size(s1), and why we did set_alignment(s2, align + data_size(s1)); inside SuperWord::set_alignment(Node* s1, Node* s2, int align).
- I decided to NOT check if use/def type sizes match during packing, but only much later in SuperWord::profitable (bad name, it has always been more about checking consistency than profitability, but I will rename that in a Future RFE). The relevant code is in SuperWord::is_velt_basic_type_compatible_use_def.

Progress

[ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue

Issue

JDK-8325155: C2 SuperWord: remove alignment boundaries (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822
$ git checkout pull/18822

Update a local copy of the PR:
$ git checkout pull/18822
$ git pull https://git.openjdk.org/jdk.git pull/18822/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18822

View PR using the GUI difftool:
$ git pr show -t 18822

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18822.diff

Webrev

Link to Webrev Comment

Apr 17 '24 17:04 eme64

:wave: Welcome back epeter! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Apr 17 '24 17:04 bridgekeeper[bot]

@eme64 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8325155: C2 SuperWord: remove alignment boundaries

Reviewed-by: chagedorn, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 165 new commits pushed to the master branch:

2a37764e7428d579a3080e62681f1c9c9f816c1e: 8333743: Change .jcheck/conf branches property to match valid branches
75dc2f8518d0adea30f7065d6732b807c0220756: 8330182: Start of release updates for JDK 24
054362abe040938b87eb1a1cab8a0a94540e0667: 8332550: [macos] Voice Over: java.awt.IllegalComponentStateException: component must be showing on the screen to determine its location
9b436d048ec92f74ec6812ae20fde21751927d4b: 8333674: Disable CollectorPolicy.young_min_ergo_vm for PPC64
487c4771818999749bfd507ab85777795bba0832: 8333647: C2 SuperWord: some additional PopulateIndex tests
d02cb742f79e88c6438ca58a6357fe432fb286cb: 8333270: HandlersOnComplexResetUpdate and HandlersOnComplexUpdate tests fail with "Unexpected reference" if timeoutFactor is less than 1/3
02f240415cbda5f67a91af50d5974fb001104170: 8333560: -Xlint:restricted does not work with --release
606df441410a69034b4c113e85ce21937d1a0808: 8332670: C1 clone intrinsic needs memory barriers
33fd6ae98638d2a4b33d18cc4acee4f0daaa9b35: 8333622: ubsan: relocInfo_x86.cpp:101:56: runtime error: pointer index expression with base (-1) overflowed
8de5d2014a87d58d389eb8400f619d1b1fa3abe7: 8332865: ubsan: os::attempt_reserve_memory_between reports overflow
... and 155 more: https://git.openjdk.org/jdk/compare/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Apr 17 '24 17:04 openjdk[bot]

@eme64 The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Apr 17 '24 18:04 openjdk[bot]

Webrevs

May 13 '24 08:05 mlbridge[bot]

⚠️ @eme64 This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

May 28 '24 17:05 openjdk[bot]

Thanks @chhagedorn for the review! I think I addressed all your points. Except for this:

You should mention here that this method finally adds a new pair to the _pairset. On a separate note, find_adjacent_memop_pairs_in_group() suggests that we find something but we actually "find and add" something without returning anything from the method. Should we make this more clear in the method name?

I have not yet found a better name. I think find is ok here. I guess we could have it be find_and_add but this seems unnecessarily long to me.

May 28 '24 17:05 eme64

FYI: I ran performance benchmarking, and there was no significant difference.

May 30 '24 06:05 eme64

Thank you for running performance testing.

Jun 04 '24 15:06 vnkozlov

@vnkozlov thanks for the review! I will integrate as soon as the JDK24 fork happens ;)

Jun 04 '24 15:06 eme64

Thanks @chhagedorn @vnkozlov for the reviews! /integrate

Jun 07 '24 05:06 eme64

Going to push as commit 944aeb81b16e3e7a3019cafdefe67b797fa6be96. Since your change was applied there have been 167 commits pushed to the master branch:

d8af58941b5dedb9774c0971895c4924e57ac28b: 8026127: Deflater/Inflater documentation incomplete/misleading
6238bc8da2abe7a1f0cdd98c0af01e9ba1869ec3: 8333456: CompactNumberFormat integer parsing fails when string has no suffix
2a37764e7428d579a3080e62681f1c9c9f816c1e: 8333743: Change .jcheck/conf branches property to match valid branches
75dc2f8518d0adea30f7065d6732b807c0220756: 8330182: Start of release updates for JDK 24
054362abe040938b87eb1a1cab8a0a94540e0667: 8332550: [macos] Voice Over: java.awt.IllegalComponentStateException: component must be showing on the screen to determine its location
9b436d048ec92f74ec6812ae20fde21751927d4b: 8333674: Disable CollectorPolicy.young_min_ergo_vm for PPC64
487c4771818999749bfd507ab85777795bba0832: 8333647: C2 SuperWord: some additional PopulateIndex tests
d02cb742f79e88c6438ca58a6357fe432fb286cb: 8333270: HandlersOnComplexResetUpdate and HandlersOnComplexUpdate tests fail with "Unexpected reference" if timeoutFactor is less than 1/3
02f240415cbda5f67a91af50d5974fb001104170: 8333560: -Xlint:restricted does not work with --release
606df441410a69034b4c113e85ce21937d1a0808: 8332670: C1 clone intrinsic needs memory barriers
... and 157 more: https://git.openjdk.org/jdk/compare/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e...master

Your commit was automatically rebased without conflicts.

Jun 07 '24 05:06 openjdk[bot]

@eme64 Pushed as commit 944aeb81b16e3e7a3019cafdefe67b797fa6be96.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Jun 07 '24 05:06 openjdk[bot]

jdk jdk copied to clipboard

8325155: C2 SuperWord: remove alignment boundaries

Progress

Issue

Reviewing

Webrev

Webrevs

jdk
jdk copied to clipboard