jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8328528: C2 should optimize long-typed parallel iv in an int counted loop

Open tabjy opened this issue 11 months ago • 6 comments

Currently, parallel iv optimization only happens in an int counted loop with int-typed parallel iv's. This PR adds support for long-typed iv to be optimized.

Additionally, this ticket contributes to the resolution of JDK-8275913. Meanwhile, I'm working on adding support for parallel IV replacement for long counted loops which will depend on this PR.


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8328528: C2 should optimize long-typed parallel iv in an int counted loop (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18489/head:pull/18489
$ git checkout pull/18489

Update a local copy of the PR:
$ git checkout pull/18489
$ git pull https://git.openjdk.org/jdk.git pull/18489/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18489

View PR using the GUI difftool:
$ git pr show -t 18489

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18489.diff

Webrev

Link to Webrev Comment

tabjy avatar Mar 26 '24 14:03 tabjy

:wave: Welcome back kxu! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Mar 26 '24 14:03 bridgekeeper[bot]

@tabjy This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8328528: C2 should optimize long-typed parallel iv in an int counted loop

Reviewed-by: roland, chagedorn, thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 366 new commits pushed to the master branch:

  • 330f2b5a9cad02b8e6882fc6eee996d7792d3de1: 8342295: compiler/jvmci/TestJVMCISavedProperties.java fails due to garbage in output
  • 1f3574855e79221739d8800235583b7c47ebae97: 8342102: ZGC: Optimize copy constructors in ZPhysicalMemory
  • 66ddaaa3591851cc420ec9e0ffe460c78a9a51f5: 8340241: RISC-V: Returns mispredicted
  • 07f550b85a3910edd28d8761e2adfb8d6a1352f6: 8340818: Add a new jtreg test root to test the generated documentation
  • 27ef6c9df47326508ee9b2b29f2ff4cec6e38377: 8341470: BigDecimal.stripTrailingZeros() optimization
  • 5d5d88ab9a862ab11bdd622aff07c688e6d96210: 8339570: Add Tidy build support for JDK tests
  • 239d84a82a1e6f4ebbd5c5abb320e39cfd5bc330: 8342578: GHA: RISC-V: Bootstrap using Debian snapshot is still failing
  • aa060f22d302789c4f80dd1ebaa233a97b6b0073: 8342334: CDS: Scratch mirrors should not point to dead klasses
  • 680dc5d896f4f7b01b3cf800d548e32bb2ef8c81: 8342496: C2/Shenandoah: SEGV in compiled code when running jcstress
  • 8f2b23bb53e81e3f9d8d84720719d129aea82a78: 8341407: C2: assert(main_limit == cl->limit() || get_ctrl(main_limit) == new_limit_ctrl) failed: wrong control for added limit
  • ... and 356 more: https://git.openjdk.org/jdk/compare/fb703258774ca14a6a239fc6d47a37e021e6036a...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@rwestrel, @eme64, @TobiHartmann, @chhagedorn) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk[bot] avatar Mar 26 '24 14:03 openjdk[bot]

@tabjy The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Mar 26 '24 14:03 openjdk[bot]

Do we also handle the reverse, int-typed parallel iv in a long counted loop?

On a related topic, I noticed that checks for is_range_check_if seem to require the type to match the loop type, but I wonder if that could be relaxed.

dean-long avatar Apr 05 '24 19:04 dean-long

@dean-long

No, it doesn't at this stage. C2 never supported parallel iv of any type in long counted loops. However, I'm working on a patch adding supports for long and int iv in long counted loops. Since there are code dependencies, I'd prefer to have this pr merged first.

tabjy avatar Apr 10 '24 20:04 tabjy

@tabjy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jun 12 '24 21:06 bridgekeeper[bot]

@tabjy This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

bridgekeeper[bot] avatar Jul 11 '24 21:07 bridgekeeper[bot]

/open

tabjy avatar Jul 22 '24 13:07 tabjy

@tabjy This pull request is now open

openjdk[bot] avatar Jul 22 '24 13:07 openjdk[bot]

@tabjy Please do not rebase or force-push to an active PR as it invalidates existing review comments. Note for future reference, the bots always squash all changes into a single commit automatically as part of the integration. See OpenJDK Developers’ Guide for more information.

openjdk[bot] avatar Jul 22 '24 13:07 openjdk[bot]

I'm observing a weird test failure exposed by GHA that only happens on Linux x86. While there are no actual loop nodes, LoopLimit nodes are matched by the same regex. I'm currently looking into this:

  • LoopLimit nodes are somehow not eliminated together with counted loops on 32bit Linux
  • the same binary bundle downloaded from GHA running on my local machine produces only one failure (instead of two) on testIntCountedLoopWithIntIVWithRandomStrides(int)
  • LoopLimit and /(\d+(\s){2}(Loop.*)+(\s){2}===.*)/ being another case of regex mis-match?
2024-07-22T18:16:53.5547051Z One or more @IR rules failed:
2024-07-22T18:16:53.5547411Z 
2024-07-22T18:16:53.5547613Z Failed IR Rules (2) of Methods (2)
2024-07-22T18:16:53.5548249Z ----------------------------------
2024-07-22T18:16:53.5550118Z 1) Method "private static int compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVWithRandomStrides(int)" - [Failed IR rules: 1]:
2024-07-22T18:16:53.5553610Z    * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#LOOP#_", "_#COUNTED_LOOP#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2024-07-22T18:16:53.5556263Z      > Phase "PrintIdeal":
2024-07-22T18:16:53.5557150Z        - failOn: Graph contains forbidden nodes:
2024-07-22T18:16:53.5557912Z          * Constraint 1: "(\d+(\s){2}(Loop.*)+(\s){2}===.*)"
2024-07-22T18:16:53.5558668Z            - Matched forbidden node:
2024-07-22T18:16:53.5559308Z              * 127  LoopLimit  === _ 22 10 83  [[ 128 ]] 
2024-07-22T18:16:53.5559791Z 
2024-07-22T18:16:53.5561343Z 2) Method "private static long compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithLongIVWithRandomStrides(int)" - [Failed IR rules: 1]:
2024-07-22T18:16:53.5565173Z    * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={}, failOn={"_#LOOP#_", "_#COUNTED_LOOP#_"}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})"
2024-07-22T18:16:53.5567518Z      > Phase "PrintIdeal":
2024-07-22T18:16:53.5568180Z        - failOn: Graph contains forbidden nodes:
2024-07-22T18:16:53.5568932Z          * Constraint 1: "(\d+(\s){2}(Loop.*)+(\s){2}===.*)"
2024-07-22T18:16:53.5569677Z            - Matched forbidden node:
2024-07-22T18:16:53.5570302Z              * 132  LoopLimit  === _ 23 10 86  [[ 133 ]] 
2024-07-22T18:16:53.5570781Z 
2024-07-22T18:16:53.5571114Z >>> Check stdout for compilation output of the failed methods

tabjy avatar Jul 22 '24 20:07 tabjy

@tabjy Yes, looks like the 32-bit VM behaves different. Do you know what makes the difference here? Yes, looks like regex-matching is also confused here, we should probably address that. If the difference of the 32-bit vs 64-bit turns out to be expected: you can just do this: test/hotspot/jtreg/compiler/loopopts/superword/RedTest_long.java: applyIfPlatform = {"64-bit", "true"},

eme64 avatar Jul 23 '24 09:07 eme64

@tabjy This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Aug 20 '24 11:08 bridgekeeper[bot]

~Currently blocked by the same problem as #18198. Solution proposed over there. Will update once the approach is approved.~

~See https://github.com/openjdk/jdk/pull/18198#issuecomment-2310711029~


Actually, with x86 being phased out and GHA testing disabled last week, this is no longer a problem. As for the mismatching regex, it should be addressed with JDK-8338809.

tabjy avatar Aug 26 '24 18:08 tabjy

Hi @eme64! Could you kindly give this a quick look if you have the time. Thanks!

tabjy avatar Sep 12 '24 14:09 tabjy

FYI going on vacation, so feel free to ask others to review!

eme64 avatar Sep 24 '24 16:09 eme64

Random tests were unstable. The latest commit addressed wrong expected values being computed if the loop variable overflows (which the formula used for computing expected values are not accounted for). Stricter upper and lower bounds for RNG were added to avoid such an overflow.

tabjy avatar Sep 26 '24 16:09 tabjy

/integrate

tabjy avatar Sep 30 '24 14:09 tabjy

@tabjy Your change (at version 6cad8c19663c6023a7b5afea5a5b831e760c2acc) is now ready to be sponsored by a Committer.

openjdk[bot] avatar Sep 30 '24 14:09 openjdk[bot]

/reviewers 2 reviewer

Please do not sponsor this yet. We might need a second pair of eyes for the review. Any reviewers please feel free to do so, and thanks very much!

tabjy avatar Sep 30 '24 14:09 tabjy

@tabjy The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

openjdk[bot] avatar Sep 30 '24 14:09 openjdk[bot]

compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java times out in our testing both with -XX:StressLongCountedLoop=200000000 and with -XX:+UnlockExperimentalVMOptions -XX:PerMethodSpecTrapLimit=0 -XX:PerMethodTrapLimit=0:

"main" #1 [2771172] prio=5 os_prio=0 cpu=500187.70ms elapsed=503.08s allocated=6554K defined_classes=227 tid=0x0000ffff9002d550 nid=2771172 runnable  [0x0000ffff972bf000]
   java.lang.Thread.State: RUNNABLE
Thread: 0x0000ffff9002d550  [0x2a48e4] State: _at_safepoint _at_poll_safepoint 1
   JavaThread state: _thread_blocked
	at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.testIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:93)
	at compiler.loopopts.parallel_iv.TestParallelIvInIntCountedLoop.runTestIntCountedLoopWithIntIVLeq(TestParallelIvInIntCountedLoop.java:103)
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@24-internal/DirectMethodHandle$Holder)
	at java.lang.invoke.LambdaForm$MH/0x0000ffff58460870.invoke(java.base@24-internal/LambdaForm$MH)
	at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@24-internal/Invokers$Holder)
	at jdk.internal.reflect.DirectMethodHandleAccessor.invokeImpl(java.base@24-internal/DirectMethodHandleAccessor.java:154)
	at jdk.internal.reflect.DirectMethodHandleAccessor.invoke(java.base@24-internal/DirectMethodHandleAccessor.java:104)
	at java.lang.reflect.Method.invoke(java.base@24-internal/Method.java:573)
	at compiler.lib.ir_framework.test.CustomRunTest.invokeTest(CustomRunTest.java:159)
	at compiler.lib.ir_framework.test.AbstractTest.run(AbstractTest.java:98)
	at compiler.lib.ir_framework.test.CustomRunTest.run(CustomRunTest.java:89)
	at compiler.lib.ir_framework.test.TestVM.runTests(TestVM.java:861)
	at compiler.lib.ir_framework.test.TestVM.start(TestVM.java:252)
	at compiler.lib.ir_framework.test.TestVM.main(TestVM.java:165)

TobiHartmann avatar Oct 07 '24 07:10 TobiHartmann

@TobiHartmann. Thanks for the feedback! I did some investigation, reasons for timeouts comes three folds:

  1. Tests with i <= stop is not a counted loop in the first place and should be removed:

    Now I remember why I originally didn't test for it. Consider for (int i = 0; i <= stop; i++); when stop = Integer.MAX_VALUE. Overflow in Java is well-defined, which means the code must loop indefinitely and optimizations of any kind can't break this. Therefore, <= are not counted loops to begin with. @IR(failOn = {IRNode.COUNTED_LOOP}) doesn't fail either. I removed these test cases.

  2. It is normal to timeout with -XX:StressLongCountedLoop=200000000 for all test cases:

    An value other than 0 for this flag will forcefully convert int counted loops to long counted loops, which C2 doesn't do parallel IV at this point. This is same issue as JDK-8294839. Loops are still loops. For a large random stop value, this will take a long time to loop through.

  3. It is normal to timeout with -XX:PerMethodTrapLimit=0 for test cases with stride other than 1:

    Take for (int i = 0; i < stop; i += 2) for an example. Since there is a chance for increment to i go beyond stop (and eventually overflows), there must be some sort of runtime check for stop. Normally, a loop_limit_check trap is compiled to take the slow path (deoptimization). However, the zero trap limit forces C2 to loop and check i < stop on every iteration. For a large random stop value, this will take a long time.

For the latter two reasons, I added runWithFlags() to essentially disable the flags in questions.

https://github.com/openjdk/jdk/blob/845e34cc7a82ef5cb69620a12f487adaca9d2613/test/hotspot/jtreg/compiler/loopopts/parallel_iv/TestParallelIvInIntCountedLoop.java#L47-L51

tabjy avatar Oct 09 '24 18:10 tabjy

Thanks for the detailed investigation and feedback. The changes look good to me, I'll re-run testing and report back.

TobiHartmann avatar Oct 15 '24 12:10 TobiHartmann

@tabjy We are still seeing timeouts with -XX:+UnlockDiagnosticVMOptions -XX:TieredStopAtLevel=3 -XX:+StressLoopInvariantCodeMotion -XX:+StressRangeCheckElimination -XX:+StressLinearScan. Maybe the test should be enabled only if C2 is available.

TobiHartmann avatar Oct 16 '24 05:10 TobiHartmann

@TobiHartmann Yes -XX:TieredStopAtLevel=3 will cause timeouts. I added the @requires vm.compiler2.enabled option to the test header.

tabjy avatar Oct 16 '24 15:10 tabjy

/integrate

tabjy avatar Oct 18 '24 14:10 tabjy

@tabjy Your change (at version 04b2c6adcacaeb5e372055419606b71d6fc84e49) is now ready to be sponsored by a Committer.

openjdk[bot] avatar Oct 18 '24 14:10 openjdk[bot]

/sponsor

rwestrel avatar Oct 21 '24 14:10 rwestrel