jdk 8328528: C2 should optimize long-typed parallel iv in an int counted loop

Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized.

Additionally, this ticket contributes to the resolution of JDK-8275913. Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR.

Progress

[ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
[x] Change must not contain extraneous whitespace
[x] Commit message must refer to an issue

Issue

JDK-8328528: C2 should optimize long-typed parallel iv in an int counted loop (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489
$ git checkout pull/18489

Update a local copy of the PR:
$ git checkout pull/18489
$ git pull https://git.openjdk.org/jdk.git pull/18489/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18489

View PR using the GUI difftool:
$ git pr show -t 18489

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18489.diff

Webrev

Link to Webrev Comment

Mar 26 '24 14:03 tabjy

:wave: Welcome back kxu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

Mar 26 '24 14:03 bridgekeeper[bot]

@tabjy This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8328528: C2 should optimize long-typed parallel iv in an int counted loop

Reviewed-by: roland, chagedorn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 366 new commits pushed to the master branch:

330f2b5a9cad02b8e6882fc6eee996d7792d3de1: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output
1f3574855e79221739d8800235583b7c47ebae97: 8342102: ZGC: Optimize copy constructors in ZPhysicalMemory
66ddaaa3591851cc420ec9e0ffe460c78a9a51f5: 8340241: RISC-V: Returns mispredicted
07f550b85a3910edd28d8761e2adfb8d6a1352f6: 8340818: Add a new jtreg test root to test the generated documentation
27ef6c9df47326508ee9b2b29f2ff4cec6e38377: 8341470: BigDecimal.stripTrailingZeros() optimization
5d5d88ab9a862ab11bdd622aff07c688e6d96210: 8339570: Add Tidy build support for JDK tests
239d84a82a1e6f4ebbd5c5abb320e39cfd5bc330: 8342578: GHA: RISC-V: Bootstrap using Debian snapshot is still failing
aa060f22d302789c4f80dd1ebaa233a97b6b0073: 8342334: CDS: Scratch mirrors should not point to dead klasses
680dc5d896f4f7b01b3cf800d548e32bb2ef8c81: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress
8f2b23bb53e81e3f9d8d84720719d129aea82a78: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit
... and 356 more: https://git.openjdk.org/jdk/compare/fb703258774ca14a6a239fc6d47a37e021e6036a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@rwestrel, @eme64, @TobiHartmann, @chhagedorn) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

Mar 26 '24 14:03 openjdk[bot]

@tabjy The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

Mar 26 '24 14:03 openjdk[bot]

Webrevs

Mar 27 '24 00:03 mlbridge[bot]

Do we also handle the reverse, int-typed parallel iv in a long counted loop?

On a related topic, I noticed that checks for is_range_check_if seem to require the type to match the loop type, but I wonder if that could be relaxed.

Apr 05 '24 19:04 dean-long

@dean-long

No, it doesn't at this stage. C2 never supported parallel iv of any type in long counted loops. However, I'm working on a patch adding supports for long and int iv in long counted loops. Since there are code dependencies, I'd prefer to have this pr merged first.

Apr 10 '24 20:04 tabjy

@tabjy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Jun 12 '24 21:06 bridgekeeper[bot]

@tabjy This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

Jul 11 '24 21:07 bridgekeeper[bot]

/open

Jul 22 '24 13:07 tabjy

@tabjy This pull request is now open

Jul 22 '24 13:07 openjdk[bot]

@tabjy Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

Jul 22 '24 13:07 openjdk[bot]

I'm observing a weird test failure exposed by GHA that only happens on Linux x86. While there are no actual loop nodes, LoopLimit nodes are matched by the same regex. I'm currently looking into this:

LoopLimit nodes are somehow not eliminated together with counted loops on 32bit Linux
the same binary bundle downloaded from GHA running on my local machine produces only one failure (instead of two) on testIntCountedLoopWithIntIVWithRandomStrides(int)
LoopLimit and /(\d+(\s){2}(Loop.*)+(\s){2}===.*)/ being another case of regex mis-match?

2024-07-22T18:16:53.5547051Z One or more @IR rules failed:
2024-07-22T18:16:53.5547411Z 
2024-07-22T18:16:53.5547613Z Failed IR Rules (2) of Methods (2)
2024-07-22T18:16:53.5548249Z ----------------------------------
2024-07-22T18:16:53.5550118Z 1) Method "private static int compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVWithRandomStrides(int)" - [Failed IR rules: 1]:
2024-07-22T18:16:53.5553610Z    * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#LOOP#_", "_#COUNTED_LOOP#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2024-07-22T18:16:53.5556263Z      > Phase "PrintIdeal":
2024-07-22T18:16:53.5557150Z        - failOn: Graph contains forbidden nodes:
2024-07-22T18:16:53.5557912Z          * Constraint 1: "(\d+(\s){2}(Loop.*)+(\s){2}===.*)"
2024-07-22T18:16:53.5558668Z            - Matched forbidden node:
2024-07-22T18:16:53.5559308Z              * 127  LoopLimit  === _ 22 10 83  [[ 128 ]] 
2024-07-22T18:16:53.5559791Z 
2024-07-22T18:16:53.5561343Z 2) Method "private static long compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithLongIVWithRandomStrides(int)" - [Failed IR rules: 1]:
2024-07-22T18:16:53.5565173Z    * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#LOOP#_", "_#COUNTED_LOOP#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2024-07-22T18:16:53.5567518Z      > Phase "PrintIdeal":
2024-07-22T18:16:53.5568180Z        - failOn: Graph contains forbidden nodes:
2024-07-22T18:16:53.5568932Z          * Constraint 1: "(\d+(\s){2}(Loop.*)+(\s){2}===.*)"
2024-07-22T18:16:53.5569677Z            - Matched forbidden node:
2024-07-22T18:16:53.5570302Z              * 132  LoopLimit  === _ 23 10 86  [[ 133 ]] 
2024-07-22T18:16:53.5570781Z 
2024-07-22T18:16:53.5571114Z >>> Check stdout for compilation output of the failed methods

Jul 22 '24 20:07 tabjy

@tabjy Yes, looks like the 32-bit VM behaves different. Do you know what makes the difference here? Yes, looks like regex-matching is also confused here, we should probably address that. If the difference of the 32-bit vs 64-bit turns out to be expected: you can just do this: test/hotspot/jtreg/compiler/loopopts/superword/RedTest_long.java: applyIfPlatform = {"64-bit", "true"},

Jul 23 '24 09:07 eme64

@tabjy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

Aug 20 '24 11:08 bridgekeeper[bot]

~Currently blocked by the same problem as #18198. Solution proposed over there. Will update once the approach is approved.~

~See https://github.com/openjdk/jdk/pull/18198#issuecomment-2310711029~

Actually, with x86 being phased out and GHA testing disabled last week, this is no longer a problem. As for the mismatching regex, it should be addressed with JDK-8338809.

Aug 26 '24 18:08 tabjy

Hi @eme64! Could you kindly give this a quick look if you have the time. Thanks!

Sep 12 '24 14:09 tabjy

FYI going on vacation, so feel free to ask others to review!

Sep 24 '24 16:09 eme64

Random tests were unstable. The latest commit addressed wrong expected values being computed if the loop variable overflows (which the formula used for computing expected values are not accounted for). Stricter upper and lower bounds for RNG were added to avoid such an overflow.

Sep 26 '24 16:09 tabjy

/integrate

Sep 30 '24 14:09 tabjy

@tabjy Your change (at version 6cad8c19663c6023a7b5afea5a5b831e760c2acc) is now ready to be sponsored by a Committer.

Sep 30 '24 14:09 openjdk[bot]

/reviewers 2 reviewer

Please do not sponsor this yet. We might need a second pair of eyes for the review. Any reviewers please feel free to do so, and thanks very much!

Sep 30 '24 14:09 tabjy

@tabjy The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

Sep 30 '24 14:09 openjdk[bot]

compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java times out in our testing both with -XX:StressLongCountedLoop=200000000 and with -XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0:

"main" #1 [2771172] prio=5 os_prio=0 cpu=500187.70ms elapsed=503.08s allocated=6554K defined_classes=227 tid=0x0000ffff9002d550 nid=2771172 runnable  [0x0000ffff972bf000]
   java.lang.Thread.State: RUNNABLE
Thread: 0x0000ffff9002d550  [0x2a48e4] State: _at_safepoint _at_poll_safepoint 1
   JavaThread state: _thread_blocked
	at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:93)
	at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.runTestIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:103)
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@24-internal/DirectMethodHandle$Holder)
	at java.lang.invoke.LambdaForm$MH/0x0000ffff58460870.invoke(java.base@24-internal/LambdaForm$MH)
	at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@24-internal/Invokers$Holder)
	at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(java.base@24-internal/DirectMethodHandleAccessor.java:154)
	at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(java.base@24-internal/DirectMethodHandleAccessor.java:104)
	at java.lang.reflect.Method.invoke(java.base@24-internal/Method.java:573)
	at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159)
	at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98)
	at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89)
	at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861)
	at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252)
	at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165)

Oct 07 '24 07:10 TobiHartmann

@TobiHartmann. Thanks for the feedback! I did some investigation, reasons for timeouts comes three folds:

Tests with i <= stop is not a counted loop in the first place and should be removed:

Now I remember why I originally didn't test for it. Consider for (int i = 0; i <= stop; i++); when stop = Integer.MAX_VALUE. Overflow in Java is well-defined, which means the code must loop indefinitely and optimizations of any kind can't break this. Therefore, <= are not counted loops to begin with. @IR(failOn = {IRNode.COUNTED_LOOP}) doesn't fail either. I removed these test cases.
It is normal to timeout with -XX:StressLongCountedLoop=200000000 for all test cases:

An value other than 0 for this flag will forcefully convert int counted loops to long counted loops, which C2 doesn't do parallel IV at this point. This is same issue as JDK-8294839. Loops are still loops. For a large random stop value, this will take a long time to loop through.
It is normal to timeout with -XX:PerMethodTrapLimit=0 for test cases with stride other than 1:

Take for (int i = 0; i < stop; i += 2) for an example. Since there is a chance for increment to i go beyond stop (and eventually overflows), there must be some sort of runtime check for stop. Normally, a loop_limit_check trap is compiled to take the slow path (deoptimization). However, the zero trap limit forces C2 to loop and check i < stop on every iteration. For a large random stop value, this will take a long time.

For the latter two reasons, I added runWithFlags() to essentially disable the flags in questions.

https://github.com/openjdk/jdk/blob/845e34cc7a82ef5cb69620a12f487adaca9d2613/test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java#L47-L51

Oct 09 '24 18:10 tabjy

Thanks for the detailed investigation and feedback. The changes look good to me, I'll re-run testing and report back.

Oct 15 '24 12:10 TobiHartmann

@tabjy We are still seeing timeouts with -XX:+UnlockDiagnosticVMOptions -XX:TieredStopAtLevel=3 -XX:+StressLoopInvariantCodeMotion -XX:+StressRangeCheckElimination -XX:+StressLinearScan. Maybe the test should be enabled only if C2 is available.

Oct 16 '24 05:10 TobiHartmann

@TobiHartmann Yes -XX:TieredStopAtLevel=3 will cause timeouts. I added the @requires vm.compiler2.enabled option to the test header.

Oct 16 '24 15:10 tabjy

/integrate

Oct 18 '24 14:10 tabjy

@tabjy Your change (at version 04b2c6adcacaeb5e372055419606b71d6fc84e49) is now ready to be sponsored by a Committer.

Oct 18 '24 14:10 openjdk[bot]

/sponsor

Oct 21 '24 14:10 rwestrel

jdk jdk copied to clipboard

8328528: C2 should optimize long-typed parallel iv in an int counted loop

Progress

Issue

Reviewing

Webrev

Webrevs

jdk
jdk copied to clipboard