Issue

When loading multiple vectors using offsets or masks (e.g. LongVector::fromArray(LongVector.SPECIES_256, storage, 0, offsets, 0 or LongVector::fromArray(LongVector.SPECIES_256, storage, 0, longMask)) there is an error in the C2 compiled code that makes different vectors be treated as equal even though they are not.

Causes

On vector-capable platforms, vector loads with masks and offsets (for Long, Integer, Float and Double) create specific nodes in the ideal graph (i.e. LoadVectorGather, LoadVectorMasked, LoadVectorGatherMasked). Vector loads without mask or offsets are mapped as LoadVector nodes instead. The same is true for StoreVectors. When running GVN loops we can get to the situation where we check if a Load node is preceded by a Store of the same address to be able to replace the Load with the input of the Store (in LoadNode::Identity). Here we call

https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1258

where we do an extra check for types if we deal with vectors but we don’t make sure that neither masks nor offsets interfere. Similarly, in StoreNode::Identity we first check if there is a Load and then a Store:

https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515

but we don’t make sure that there are no masks or offsets. A few lines below, we check if there are 2 stores for the same value in a row. We need to check for masks and offsets here too but in this case we can include these cases if the masks and offsets of the vector stores are equivalent.

Solution

To avoid folding Load- and StoreVectors with masks and offsets we add a specific store_Opcode method to LoadVectorGatherNode, LoadVectorMaskedNode and LoadVectorGatherMaskedNode that doesn’t return a store opcode but instead returns its own (to avoid ever being the same as a store node). In this way, the checks in MemNode::can_see_stored_value

https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L1164-L1166

and StoreNode::Identity

https://github.com/openjdk/jdk/blob/87e864bf21d71daae4e001ec4edbb4ef1f60c36d/src/hotspot/share/opto/memnode.cpp#L3509-L3515

will fail if masks or offsets are used. For 2 stores of the same value we instead check for mask and offset equality.

Regression tests for all versions of Load/StoreVectorGather/Masked have been added too.

Progress

[ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue

Issue

JDK-8325520: Vector loads with offsets incorrectly compiled (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18347/head:pull/18347
$ git checkout pull/18347

Update a local copy of the PR:
$ git checkout pull/18347
$ git pull https://git.openjdk.org/jdk.git pull/18347/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18347

View PR using the GUI difftool:
$ git pr show -t 18347

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18347.diff

Webrev

Link to Webrev Comment

Mar 18 '24 12:03 dafedafe

:wave: Welcome back dfenacci! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Mar 18 '24 12:03 bridgekeeper[bot]

@dafedafe This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8325520: Vector loads and stores with indices and masks incorrectly compiled

Reviewed-by: epeter, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 359 new commits pushed to the master branch:

612ae9289a130b8701f74253fe5499358a2e2b5b: 8332735: [JVMCI] Add extra JVMCI events for exception translation
1ea76d338b99900089277b7a2da82c24382a6ce1: 8332675: test/hotspot/jtreg/gc/testlibrary/Helpers.java compileClass javadoc does not match after 8321812
94af3c23ea09ef2869cdc666d8170a655a0b3602: 8329203: Parallel: Investigate Mark-Compact for Full GC to decrease memory usage
1e5a2780d9cc8e73ce65bdccb98c1808aadd0784: 8332676: Remove unused BarrierSetAssembler::incr_allocated_bytes
c2180d141ccca0e396ee9a0cd3044c4428b963d5: 8315767: InetAddress: constructing objects from BSD literal addresses
2a11e0da026066191e4d4f30b9daca986c484630: 8332743: Update comment related to JDK-8320522
6829d9ac67fb131462d3ef1c4bdfaa07df5d6be6: 8332122: [nmt] Totals for malloc should show total peak
9d332e6591334a71335da65a4dd7b2ed0482b6cb: 8307193: Several Swing jtreg tests use class.forName on L&F classes
98f6a80852383dcbdad7292b7d269a8547d54d45: 8332490: JMH org.openjdk.bench.java.util.zip.InflaterInputStreams.inflaterInputStreamRead OOM
3d4185a9ce482cc655a4c67f39cb2682b02ae4fe: 8332739: Problemlist compiler/codecache/CheckLargePages until JDK-8332654 is fixed
... and 349 more: https://git.openjdk.org/jdk/compare/174d62652c69e811cf44ab64db575b13a848a728...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

Mar 18 '24 12:03 openjdk[bot]

@dafedafe The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Mar 18 '24 12:03 openjdk[bot]

I see that LoadVectorNode has no additional slots, but then LoadVectorMaskedNode and LoadVectorGatherMaskedNode simply do add_req. Why not just compare these extra slots (as many as there are) in LoadVectorNode::Identity, as well as checking that their Opcode is the same? Would that not be simpler and accomplish the same?

Mar 21 '24 08:03 eme64

Can you please fix the whitespace issues, it is a bit difficult to read now on GitHub ;)

Sorry, I let them slip in. Fixed.

Mar 21 '24 09:03 dafedafe

It seems that your code would now disallow such a case, because you always check that the Ideal node you get back is of the same type as the this node. Am I right about that? Is that intended?

If I am right that you would have made such a optimization impossible, that probably means that our tests don't have an IR test that cover this case. You would definitely need to add such IR tests, otherwise we don't know if we are getting regressions. You will probably also have to run this patch through performance testing eventually.

I've tried to use the IR framework to check for folded/non folded nodes but couldn't make it reliable enough (in the end it didn't test more than the current test). So I decided to go back to the actual "regression" test which reproduces the original issue.

Do you know what is the issue with reliability for the IR rules? Why did it not always work?

Mar 27 '24 09:03 eme64

I think you checks should also not be done after we already create a new node, but ideally before we create the new node. That is the normal pattern I see everywhere. So then you would probably have to dig into MemNode::can_see_stored_value and other places, to see where you would need to do your additional checks. There are already some checks for vector-type there, so that would be one intuitive starting-point.

I've removed the Identity overrides and changed the checks in MemNode::can_see_stored_value as you suggested (Identity methods were also a bit too restrictive). I also had to add checks to StoreNode::Identity here

https://github.com/openjdk/jdk/blob/7eb78e332080df3890b66ca04338a4ba69af45eb/src/hotspot/share/opto/memnode.cpp#L2799-L2814

Apr 09 '24 07:04 dafedafe

If I am right that you would have made such a optimization impossible, that probably means that our tests don't have an IR test that cover this case. You would definitely need to add such IR tests, otherwise we don't know if we are getting regressions. You will probably also have to run this patch through performance testing eventually.

I've transformed the tests to add IR tests as well. The issue with them seems to be related with JDK-8302459 (basically there are some missing cleanups when performing late inlining). So, for now the tests force a cleanup at every step (-XX:+IncrementalInlineForceCleanup).

Apr 09 '24 07:04 dafedafe

@dafedafe Nice, I think this already looks much better. Let me know if/when you want me to look at it again ;)

Apr 15 '24 06:04 eme64

@dafedafe Nice, I think this already looks much better. Let me know if/when you want me to look at it again ;)

Thanks a lot @eme64!

Apr 15 '24 07:04 dafedafe

Webrevs

Apr 15 '24 07:04 mlbridge[bot]

I've randomly thought about this change again (while trying to sleep yesterday...). And started worrying about this kind of case:

import jdk.incubator.vector.*;

public class Test {
    private static final VectorSpecies<Integer> I_SPECIES = IntVector.SPECIES_MAX;

    public static void main(String[] args) {
        // create mask
        boolean[] intMask = new boolean[I_SPECIES.length()];
        for (int i = 0; i < intMask.length; i++) {
            intMask[i] = (i % 2 == 0);
        }
        int[] intArray1 = new int[I_SPECIES.length()];
        int[] intArray2 = new int[I_SPECIES.length()];
        int[] intArray3 = new int[I_SPECIES.length()];
        for (int i = 0; i < intArray1.length; i++) {
            intArray1[i] = i;
            intArray2[i] = -2 * i;
            intArray3[i] = 0;
        }

        for (int i = 0; i < 10_000; i++) {
            test1(intMask, intArray1, intArray2, intArray3);
        }

        for (int i = 0; i < intArray1.length; i++) {
            System.out.println("i: " + i + " " + intArray1[i] + " " +intArray2[i] + " " + intArray3[i]);
        }
    }

    static void test1(boolean[] intMask, int[] intArray1, int[] intArray2, int[] intArray3) {
        VectorMask<Integer> intVectorMask = VectorMask.fromArray(I_SPECIES, intMask, 0);

        // Load values:    0   1   2   3   4   5 ...
        IntVector a = IntVector.fromArray(I_SPECIES, intArray1, 0);

        // Store, but only every second value:
	// Store:          0   x   2   x   4   x ...
	// Already there:  0  -2  -4  -6  -8 -10 ...
	// Result:         0  -2   2  -6   4 -10 ...
        a.intoArray(intArray2, 0, intVectorMask);

        // Load, but we cannot just take the value from vector a, since we mask it, and where the mask is
	// off, we must have zero.
	// Load:           0   0   2   0   4   0 ...
        IntVector b = IntVector.fromArray(I_SPECIES, intArray2, 0, intVectorMask);
        b.intoArray(intArray3, 0);
    }
}

This is not a bug yet, I think, but it could be if you do the change I suggested with store_Opcode(). Currently, if we have:

 2035  StoreVectorMasked  === 934 7 2031 1979 1956  |893  [[ 2034 2041 2066 ]]  @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[0] *, idx=7; mismatched  Memory: @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):exact[0] *, idx=7; !jvms: IntVector::intoArray0Template @ bci:50 (line 3556) IntMaxVector::intoArray0 @ bci:9 (line 911) IntVector::intoArray @ bci:53 (line 3269) Test::test1 @ bci:26 (line 34)
 2041  LoadVectorMasked  === 934 2035 2031 1956  |893  [[ 1767 2066 1750 ]]  @int[int:>=0] (java/lang/Cloneable,java/io/Serializable):NotNull:exact[0] *, idx=7; mismatched #vectorz[16]:{int} !jvms: IntVector::fromArray0Template @ bci:53 (line 3455) IntMaxVector::fromArray0 @ bci:11 (line 874) IntVector::fromArray @ bci:32 (line 2986) Test::test1 @ bci:36 (line 35)

The load does not succeed in MemNode::can_see_stored_value, because the load's store_Opcode() == Op_StoreVector, and not Op_StoreVectorMasked. But if you were to implement LoadVectorMasked::store_Opcode() const { return Op_StoreVectorMasked; }, then you have to be careful: A masked load does not necessarily return the same as the masked store's input value. That input value is not yet masked, but the loaded value needs to be masked.

But it seems to me that you can actually never have a successful MemNode::can_see_stored_value case for masked operations, with your current code. It would always fail the store_Opcode() == st->Opcode() check. And for that gives the correct result, but it is still a bit strange that we don't override the store_Opcode for the masked/offset vector stores.

I don't know which way you want to go now. I these options:

Keep disallowing masked load/store "look-throughs".
- Do that by having the "incorrect" store_Opcode as now. The downside is that the "offset only" case does not manage to do the look-through, even though that would be correct.
- OR: have the correct store_Opcode, which allows the look-through for the "offset only" case. But then explicitly check for the masked cases, and disallow those.
Implement a special look-through, where you apply the mask with some blend/select/masked operation on the store input value, which simulates the masked load (i.e. you need to put zeros where the mask is off).

Not sure if this is all very clear, feel free to ask.

Apr 23 '24 10:04 eme64

@dafedafe this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8325520
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

Apr 24 '24 09:04 openjdk[bot]

⚠️ @dafedafe This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

Apr 25 '24 08:04 openjdk[bot]

The load does not succeed in MemNode::can_see_stored_value, because the load's store_Opcode() == Op_StoreVector, and not Op_StoreVectorMasked. But if you were to implement LoadVectorMasked::store_Opcode() const { return Op_StoreVectorMasked; }, then you have to be careful: A masked load does not necessarily return the same as the masked store's input value. That input value is not yet masked, but the loaded value needs to be masked.

But it seems to me that you can actually never have a successful MemNode::can_see_stored_value case for masked operations, with your current code. It would always fail the store_Opcode() == st->Opcode() check. And for that gives the correct result, but it is still a bit strange that we don't override the store_Opcode for the masked/offset vector stores.

I don't know which way you want to go now. I these options:

Keep disallowing masked load/store "look-throughs".

Do that by having the "incorrect" store_Opcode as now. The downside is that the "offset only" case does not manage to do the look-through, even though that would be correct.

OR: have the correct store_Opcode, which allows the look-through for the "offset only" case. But then explicitly check for the masked cases, and disallow those.

Implement a special look-through, where you apply the mask with some blend/select/masked operation on the store input value, which simulates the masked load (i.e. you need to put zeros where the mask is off).

Not sure if this is all very clear, feel free to ask.

Yep it is actually very clear, thanks a lot @eme64! Actually your example is with masks, but there is a similar problem with offsets as well. Think of what happens if the offset array is shorter than the length of the array or if there are duplicate indices in the offset array: the result of a load-store or a store-load sequence doesn’t produce the same original array or vector. So actually to avoid folding in those 2 cases (load->store and store->load) for now I made specific store_Opcode implementations that are never the same as a Store opcode, so that the store_Opcode() == st->Opcode() and val->as_Load()->store_Opcode() == Opcode() is always false (I’ve used the opcode of the LoadNode itself but possibly there is a better solution). As you hint one could also think of a solution involving substituting load/store masked vector nodes with “simple” vector mask nodes and load/store gather/scatter vector nodes with “simple” vector shuffle nodes (if these exist) but I’m not sure how much this would bring and anyway for this issue I would keep the fix simple. What do you think?

Apr 25 '24 08:04 dafedafe

Nice, ah you are right, there can be issues with mask-only cases as well!

It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong.

I'll look at your store_Opcode changes now...

Apr 25 '24 12:04 eme64

About storing a IntVector to memory, and then loading as FloatVector: You can use a MemorySegment:

// Wrap an array into a MemorySegment
MemorySegment ms = MemorySegment.fromArray(new byte[10_000]);

// Create your favourite int vector
IntVector intVector = ...;

// Store that int vector to the memory segment (internally, it does checkIndex and unsafe store to the byte array)
intVector.intoMemorySegment(ms, offset, ByteOrder.nativeOrder());

// Load a float vector from the memory segment (internally, it does checkIndex and unsafe load from the byte array)
FloatVector floatVector = FloatVector.fromMemorySegment(ms, offset, ByteOrder.nativeOrder())

I did not test this, but I think something like this should work.

Apr 25 '24 12:04 eme64

// Load a float vector from the memory segment (internally, it does checkIndex and unsafe load from the byte array) FloatVector floatVector = FloatVector.fromMemorySegment(ms, offset, ByteOrder.nativeOrder())
I did not test this, but I think something like this should work.

Right! I totally missed that from- intoMmemorySegment methods! I guess that’s probably one reason why vector nodes are not typed. I’ve added checks to StoreNode::Identity as well.

May 06 '24 08:05 dafedafe

It would be great if you had tests that exactly exercise these "bad" examples, where it looks like we might optimize, but it would be wrong.

Yep, good idea. I've added a few tests to check for those cases (load-store with duplicate offsets and store-load with masks). Thanks @eme64!

May 06 '24 15:05 dafedafe

@dafedafe I also scanned quickly over the regression tests. I see at least two aspects missing:

No mixed type test for load-store: Use MemorySegment from/intoMmemorySegment. Try something like store a int-vector, and load a float-vector.
Mismatched vector length: store a vector of length 4, and load one of length 8.

I think all of these are currently correctly handled by your vect_type checks in the VM code, but it would be good to see that they are covered by regression tests, in case someone messes this up in the future.

May 07 '24 12:05 eme64

Ah, some more missing cases:

Do some store-store and store-load cases where you the first and second are different loads/stores, i.e. one with and one without mask/offsets. E.g. StoreVectorMasked and StoreVectorScatter in a store-store test. Doing the total cross-product is probably too much, but a few examples would be a good start.

May 07 '24 13:05 eme64

No mixed type test for load-store: Use MemorySegment from/intoMmemorySegment. Try something like store a int-vector, and load a float-vector.

It looks as if load/stores that use from/intoMemorySegment with different types apparently don’t create LoadVector nodes. It seems that fromMemorySegment tries to inline the VectorSupport::load intrinsic, but fails as the type of the vector and the inferred type of the underlying memory segment differ: https://github.com/openjdk/jdk/blob/9b742109b196d79cbf712ffd3f64edd1d6497114/src/hotspot/share/opto/vectorIntrinsics.cpp#L1055-L1064

🤔 I'm just wondering if we could get rid of the type check then...

Mismatched vector length: store a vector of length 4, and load one of length 8.

I've added tests tests that store and load with different species (SPECIES_64).

Do some store-store and store-load cases where you the first and second are different loads/stores, i.e. one with and one without mask/offsets. E.g. StoreVectorMasked and StoreVectorScatter in a store-store test. Doing the total cross-product is probably too much, but a few examples would be a good start.

You're right, there were just very few of them. Added many more.

May 13 '24 21:05 dafedafe

It looks as if load/stores that use from/intoMemorySegment with different types apparently don’t create LoadVector nodes. It seems that fromMemorySegment tries to inline the VectorSupport::load intrinsic, but fails as the type of the vector and the inferred type of the underlying memory segment differ:

Ha, that seems to be a bit of an arbitrary (maybe just conservative) restriction. I hope we can lift that in the future. We do not have such restrictions for scalar array load/store. And we can also Auto-Vectorize those. Would be interesting to ask some VectorAPI folks about that.

Because we could allow those to use vectors in the future: I would leave the type checks in for now.

Looking at the tests now!

May 21 '24 06:05 eme64

Nice, the tests look good, thanks for adding them!

May 21 '24 06:05 eme64

Because we could allow those to use vectors in the future: I would leave the type checks in for now.

👍 better safe than sorry

May 21 '24 13:05 dafedafe

This part in the PR description could be updated: now we return -1 for those that we think are not "comparable".

You're right. Fixed. Thanks a lot for the review @eme64!!

May 21 '24 14:05 dafedafe

@dafedafe I recommend that you update the Bug title. It names "offsets", but really you probably want to name "indices and mask", or just "masked and gather/scatter vector operations". And it also is not just about "loads", but also "stores", correct?

May 22 '24 14:05 eme64

@dafedafe I recommend that you update the Bug title. It names "offsets", but really you probably want to name "indices and mask", or just "masked and gather/scatter vector operations". And it also is not just about "loads", but also "stores", correct?

Yep. I changed the title... and adapted the description as well (offsets -> indices). Thanks @eme64

May 22 '24 14:05 dafedafe

/integrate

May 24 '24 13:05 dafedafe

Going to push as commit 0c934ff4e2fb53a72ad25a080d956745a5649f9b. Since your change was applied there have been 382 commits pushed to the master branch:

c099f14f07260713229cffbe7d23aa8305415a67: 8305457: Implement java.io.IO
6a35311468222f9335b43d548df2ecb80746b389: 8241550: [macOS] SSLSocketImpl/ReuseAddr.java failed due to "BindException: Address already in use"
f16265d69b09640b972b7494ad57158dbdc426bb: 8332226: "Invalid package name:" from source launcher
5a2ba952b120394d7cc0d0890619780c1c27a078: 8325841: Remove unused references to vmSymbols.hpp
239c1b33b47de43369673f33d9449e1904477ce0: 8332807: Parallel: Make some APIs in ParMarkBitMap private
9b61a7608efff13fc3685488f3f54a810ec0ac22: 8332615: RISC-V: Support vector unsigned comparison instructions for machines with RVV
a71b40478510db3c69696df608fd1b32f41c57f3: 8331398: G1: G1HeapRegionPrinter reclamation events should print the original region type
af056c1676dab3b0b35666a8259db60f9bbf824e: 8332106: VerifyError when using switch pattern in this(...) or super(...)
da3001daf79bf943d6194d9fd60250d519b9680d: 8331975: Enable case-insensitive check in ccache and keytab entry lookup
424eb60dedb332237b8ec97e9da6bd95442c0083: 8331683: Clean up GetCarrierThread
... and 372 more: https://git.openjdk.org/jdk/compare/174d62652c69e811cf44ab64db575b13a848a728...master

Your commit was automatically rebased without conflicts.

May 24 '24 13:05 openjdk[bot]

jdk jdk copied to clipboard

8325520: Vector loads with offsets incorrectly compiled

Issue

Causes

Solution

Progress

Issue

Reviewing

Webrev

Webrevs

jdk
jdk copied to clipboard