jdk
jdk copied to clipboard
8328528: C2 should optimize long-typed parallel iv in an int counted loop
Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized.
Additionally, this ticket contributes to the resolution of JDK-8275913. Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR.
Progress
- [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
- [x] Change must not contain extraneous whitespace
- [x] Commit message must refer to an issue
Issue
- JDK-8328528: C2 should optimize long-typed parallel iv in an int counted loop (Enhancement - P4)
Reviewing
Using git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489
$ git checkout pull/18489
Update a local copy of the PR:
$ git checkout pull/18489
$ git pull https://git.openjdk.org/jdk.git pull/18489/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18489
View PR using the GUI difftool:
$ git pr show -t 18489
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18489.diff
Webrev
:wave: Welcome back kxu! A progress list of the required criteria for merging this PR into master
will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.
@tabjy This change now passes all automated pre-integration checks.
ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.
After integration, the commit message for the final commit will be:
8328528: C2 should optimize long-typed parallel iv in an int counted loop
Reviewed-by: roland, chagedorn, thartmann
You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.
At the time when this comment was updated there had been 366 new commits pushed to the master
branch:
- 330f2b5a9cad02b8e6882fc6eee996d7792d3de1: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output
- 1f3574855e79221739d8800235583b7c47ebae97: 8342102: ZGC: Optimize copy constructors in ZPhysicalMemory
- 66ddaaa3591851cc420ec9e0ffe460c78a9a51f5: 8340241: RISC-V: Returns mispredicted
- 07f550b85a3910edd28d8761e2adfb8d6a1352f6: 8340818: Add a new jtreg test root to test the generated documentation
- 27ef6c9df47326508ee9b2b29f2ff4cec6e38377: 8341470: BigDecimal.stripTrailingZeros() optimization
- 5d5d88ab9a862ab11bdd622aff07c688e6d96210: 8339570: Add Tidy build support for JDK tests
- 239d84a82a1e6f4ebbd5c5abb320e39cfd5bc330: 8342578: GHA: RISC-V: Bootstrap using Debian snapshot is still failing
- aa060f22d302789c4f80dd1ebaa233a97b6b0073: 8342334: CDS: Scratch mirrors should not point to dead klasses
- 680dc5d896f4f7b01b3cf800d548e32bb2ef8c81: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress
- 8f2b23bb53e81e3f9d8d84720719d129aea82a78: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit
- ... and 356 more: https://git.openjdk.org/jdk/compare/fb703258774ca14a6a239fc6d47a37e021e6036a...master
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.
As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@rwestrel, @eme64, @TobiHartmann, @chhagedorn) but any other Committer may sponsor as well.
➡️ To flag this PR as ready for integration with the above commit message, type /integrate
in a new comment. (Afterwards, your sponsor types /sponsor
in a new comment to perform the integration).
@tabjy The following label will be automatically applied to this pull request:
-
hotspot-compiler
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.
Webrevs
- 25: Full - Incremental (81bce8ff)
- 24: Full - Incremental (4314d739)
- 23: Full - Incremental (c37484ab)
- 22: Full - Incremental (04b2c6ad)
- 21: Full - Incremental (845e34cc)
- 20: Full - Incremental (32bedd00)
- 19: Full - Incremental (4e2735ae)
- 18: Full - Incremental (6cad8c19)
- 17: Full - Incremental (5d1ee27a)
- 16: Full - Incremental (de6b7e20)
- 15: Full - Incremental (66b04622)
- 14: Full (5925761f)
- 13: Full - Incremental (4094231e)
- 12: Full - Incremental (2f053c5a)
- 11: Full - Incremental (64bf036e)
- 10: Full (e1e112da)
- 09: Full - Incremental (63ff69ea)
- 08: Full - Incremental (20bdc791)
- 07: Full - Incremental (596dbf9a)
- 06: Full - Incremental (6ae66d4a)
- 05: Full - Incremental (981b9600)
- 04: Full - Incremental (be0f596d)
- 03: Full - Incremental (85820dee)
- 02: Full - Incremental (dcd55681)
- 01: Full - Incremental (efc270ec)
- 00: Full (2230c7a6)
Do we also handle the reverse, int-typed parallel iv in a long counted loop?
On a related topic, I noticed that checks for is_range_check_if seem to require the type to match the loop type, but I wonder if that could be relaxed.
@dean-long
No, it doesn't at this stage. C2 never supported parallel iv of any type in long counted loops. However, I'm working on a patch adding supports for long and int iv in long counted loops. Since there are code dependencies, I'd prefer to have this pr merged first.
@tabjy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
@tabjy This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open
pull request command.
/open
@tabjy This pull request is now open
@tabjy Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.
I'm observing a weird test failure exposed by GHA that only happens on Linux x86. While there are no actual loop nodes, LoopLimit
nodes are matched by the same regex. I'm currently looking into this:
-
LoopLimit
nodes are somehow not eliminated together with counted loops on 32bit Linux - the same binary bundle downloaded from GHA running on my local machine produces only one failure (instead of two) on
testIntCountedLoopWithIntIVWithRandomStrides(int)
-
LoopLimit
and/(\d+(\s){2}(Loop.*)+(\s){2}===.*)/
being another case of regex mis-match?
2024-07-22T18:16:53.5547051Z One or more @IR rules failed:
2024-07-22T18:16:53.5547411Z
2024-07-22T18:16:53.5547613Z Failed IR Rules (2) of Methods (2)
2024-07-22T18:16:53.5548249Z ----------------------------------
2024-07-22T18:16:53.5550118Z 1) Method "private static int compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVWithRandomStrides(int)" - [Failed IR rules: 1]:
2024-07-22T18:16:53.5553610Z * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#LOOP#_", "_#COUNTED_LOOP#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2024-07-22T18:16:53.5556263Z > Phase "PrintIdeal":
2024-07-22T18:16:53.5557150Z - failOn: Graph contains forbidden nodes:
2024-07-22T18:16:53.5557912Z * Constraint 1: "(\d+(\s){2}(Loop.*)+(\s){2}===.*)"
2024-07-22T18:16:53.5558668Z - Matched forbidden node:
2024-07-22T18:16:53.5559308Z * 127 LoopLimit === _ 22 10 83 [[ 128 ]]
2024-07-22T18:16:53.5559791Z
2024-07-22T18:16:53.5561343Z 2) Method "private static long compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithLongIVWithRandomStrides(int)" - [Failed IR rules: 1]:
2024-07-22T18:16:53.5565173Z * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#LOOP#_", "_#COUNTED_LOOP#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2024-07-22T18:16:53.5567518Z > Phase "PrintIdeal":
2024-07-22T18:16:53.5568180Z - failOn: Graph contains forbidden nodes:
2024-07-22T18:16:53.5568932Z * Constraint 1: "(\d+(\s){2}(Loop.*)+(\s){2}===.*)"
2024-07-22T18:16:53.5569677Z - Matched forbidden node:
2024-07-22T18:16:53.5570302Z * 132 LoopLimit === _ 23 10 86 [[ 133 ]]
2024-07-22T18:16:53.5570781Z
2024-07-22T18:16:53.5571114Z >>> Check stdout for compilation output of the failed methods
@tabjy Yes, looks like the 32-bit VM behaves different. Do you know what makes the difference here?
Yes, looks like regex-matching is also confused here, we should probably address that.
If the difference of the 32-bit vs 64-bit turns out to be expected: you can just do this:
test/hotspot/jtreg/compiler/loopopts/superword/RedTest_long.java: applyIfPlatform = {"64-bit", "true"},
@tabjy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!
~Currently blocked by the same problem as #18198. Solution proposed over there. Will update once the approach is approved.~
~See https://github.com/openjdk/jdk/pull/18198#issuecomment-2310711029~
Actually, with x86 being phased out and GHA testing disabled last week, this is no longer a problem. As for the mismatching regex, it should be addressed with JDK-8338809.
Hi @eme64! Could you kindly give this a quick look if you have the time. Thanks!
FYI going on vacation, so feel free to ask others to review!
Random tests were unstable. The latest commit addressed wrong expected values being computed if the loop variable overflows (which the formula used for computing expected values are not accounted for). Stricter upper and lower bounds for RNG were added to avoid such an overflow.
/integrate
@tabjy Your change (at version 6cad8c19663c6023a7b5afea5a5b831e760c2acc) is now ready to be sponsored by a Committer.
/reviewers 2 reviewer
Please do not sponsor this yet. We might need a second pair of eyes for the review. Any reviewers please feel free to do so, and thanks very much!
@tabjy The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).
compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java
times out in our testing both with -XX:StressLongCountedLoop=200000000
and with -XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0
:
"main" #1 [2771172] prio=5 os_prio=0 cpu=500187.70ms elapsed=503.08s allocated=6554K defined_classes=227 tid=0x0000ffff9002d550 nid=2771172 runnable [0x0000ffff972bf000]
java.lang.Thread.State: RUNNABLE
Thread: 0x0000ffff9002d550 [0x2a48e4] State: _at_safepoint _at_poll_safepoint 1
JavaThread state: _thread_blocked
at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:93)
at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.runTestIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:103)
at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@24-internal/DirectMethodHandle$Holder)
at java.lang.invoke.LambdaForm$MH/0x0000ffff58460870.invoke(java.base@24-internal/LambdaForm$MH)
at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@24-internal/Invokers$Holder)
at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(java.base@24-internal/DirectMethodHandleAccessor.java:154)
at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(java.base@24-internal/DirectMethodHandleAccessor.java:104)
at java.lang.reflect.Method.invoke(java.base@24-internal/Method.java:573)
at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159)
at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98)
at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89)
at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861)
at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252)
at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165)
@TobiHartmann. Thanks for the feedback! I did some investigation, reasons for timeouts comes three folds:
-
Tests with
i <= stop
is not a counted loop in the first place and should be removed:Now I remember why I originally didn't test for it. Consider
for (int i = 0; i <= stop; i++);
whenstop = Integer.MAX_VALUE
. Overflow in Java is well-defined, which means the code must loop indefinitely and optimizations of any kind can't break this. Therefore,<=
are not counted loops to begin with.@IR(failOn = {IRNode.COUNTED_LOOP})
doesn't fail either. I removed these test cases. -
It is normal to timeout with
-XX:StressLongCountedLoop=200000000
for all test cases:An value other than
0
for this flag will forcefully convert int counted loops to long counted loops, which C2 doesn't do parallel IV at this point. This is same issue as JDK-8294839. Loops are still loops. For a large randomstop
value, this will take a long time to loop through. -
It is normal to timeout with
-XX:PerMethodTrapLimit=0
for test cases with stride other than1
:Take
for (int i = 0; i < stop; i += 2)
for an example. Since there is a chance for increment toi
go beyondstop
(and eventually overflows), there must be some sort of runtime check forstop
. Normally, aloop_limit_check
trap is compiled to take the slow path (deoptimization). However, the zero trap limit forces C2 to loop and checki < stop
on every iteration. For a large randomstop
value, this will take a long time.
For the latter two reasons, I added runWithFlags()
to essentially disable the flags in questions.
https://github.com/openjdk/jdk/blob/845e34cc7a82ef5cb69620a12f487adaca9d2613/test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java#L47-L51
Thanks for the detailed investigation and feedback. The changes look good to me, I'll re-run testing and report back.
@tabjy We are still seeing timeouts with -XX:+UnlockDiagnosticVMOptions -XX:TieredStopAtLevel=3 -XX:+StressLoopInvariantCodeMotion -XX:+StressRangeCheckElimination -XX:+StressLinearScan
. Maybe the test should be enabled only if C2 is available.
@TobiHartmann Yes -XX:TieredStopAtLevel=3
will cause timeouts. I added the @requires vm.compiler2.enabled
option to the test header.
/integrate
@tabjy Your change (at version 04b2c6adcacaeb5e372055419606b71d6fc84e49) is now ready to be sponsored by a Committer.
/sponsor