jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop

Open rwestrel opened this issue 9 months ago • 7 comments

In the test case:

long i;
for (; i > 0; i--) {
    res += 42 / ((int) i);

The long counted loop phi has type [1..100]. As a consequence, the ConvL2I also has type [1..100]. The DivI node that follows can't fault: it is not guarded by a zero check and has no control set.

The ConvL2I is split through phi and so is the DiVI node: PhaseIdealLoop::cannot_split_division() returns true because the value coming from the backedge into the DivI (when it is about to be split thru phi) is the result of the ConvL2I which has type [1..100] so is not zero as far as the compiler can tell.

On the last iteration of the loop, i is 1. Because the DivI was split thru Phi, it computes the value for the following iteration, so for i = 0. This causes a crash when the compiled code runs.

The same problem can't happen with an int counted loop because logic in PhaseIdealLoop::split_thru_phi() prevents a ConvI2L from being split thru phi. I propose to fix this the same way: in the test case, it's not true that once the ConvL2I is split thru phi it keeps type [1..100]. The fix is fairly conservative because it's base on the existing logic for ConvI2L: we would want to not split a ConvL2I only a counted loopd but. I suppose the same is true for the ConvI2L and I thought it would be best to revisit both together.


Progress

  • [x] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/19086/head:pull/19086
$ git checkout pull/19086

Update a local copy of the PR:
$ git checkout pull/19086
$ git pull https://git.openjdk.org/jdk.git pull/19086/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 19086

View PR using the GUI difftool:
$ git pr show -t 19086

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/19086.diff

Webrev

Link to Webrev Comment

rwestrel avatar May 03 '24 12:05 rwestrel

:wave: Welcome back roland! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar May 03 '24 12:05 bridgekeeper[bot]

@rwestrel This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8331575: C2: crash when ConvL2I is split thru phi at LongCountedLoop

Reviewed-by: chagedorn, epeter

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 158 new commits pushed to the master branch:

  • ab8d7b0cedfaae124262325cd1d4b59cef996d85: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs
  • fe8a2aff3129b515c2a0f3ab96f5e3ad6cef7b70: 8307778: com/sun/jdi/cds tests fail with jtreg's Virtual test thread factory
  • 95f79c678737fb8de9ed45c516761d4d818869ef: 8332253: Linux arm32 build fails after 8292591
  • b687aa550837830b38f0f0faa69c353b1e85219c: 8332176: Refactor ClassListParser::parse()
  • 4083255440cfbf39b9683ea88a433d71ec6111e7: 8316138: Add GlobalSign 2 TLS root certificates
  • 43b109b111e77d0f7b302debc0d76e4ac7c9ac56: 8330066: HeapDumpPath and HeapDumpGzipLevel VM options do not mention HeapDumpBeforeFullGC and HeapDumpAfterFullGC
  • 7cff04fc8a8114a297437aa526b18b6185831eac: 8330276: Console methods with explicit Locale
  • 8a4315f833f3700075d65fae6bc566011c837c07: 8331987: Enhance stacktrace clarity for CompletableFuture CancellationException
  • 491b3b45634fffb0101244f7d491a1681e7e8002: 8332256: Shenandoah: Do not visit heap threads during shutdown
  • 9c02c8dd71023df6338cb94997bca6b00768af6f: 8332255: Shenandoah: Remove duplicate definition of init mark closure
  • ... and 148 more: https://git.openjdk.org/jdk/compare/7a41a525deb796396ade1456f1d0a101ac705014...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar May 03 '24 12:05 openjdk[bot]

@rwestrel The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar May 03 '24 12:05 openjdk[bot]

Webrevs

mlbridge[bot] avatar May 03 '24 12:05 mlbridge[bot]

You could also add the regression tests from the duplicated issue JDK-8298851.

I added one of them because it doesn't seem to need StressGCM. Does it really make sense to add all of them?

rwestrel avatar May 13 '24 13:05 rwestrel

I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch.

That's not entirely true here. The ConvL2I captures the type of its input so not a narrower type. The problem is that the type is that of a Phi for a counted loop and once pushed through phi, the type captured by the ConvI2L becomes incorrect.

rwestrel avatar May 13 '24 13:05 rwestrel

I guess the issue is that ConvL2I and ConvI2L are also type nodes, which can restrict their type, just like CastII nodes. And that restricting of the type is only true under a certain if-branch.

That's not entirely true here. The ConvL2I captures the type of its input so not a narrower type. The problem is that the type is that of a Phi for a counted loop and once pushed through phi, the type captured by the ConvI2L becomes incorrect.

So what exactly is it that guarantees the correctness of the phi range under the counted loop that is not true when you push it back? I mean I would assume the phi can only have values that its inputs actually produce, so its inputs cannot have wildly different ranges, right? At some point, this range must be established by some control flow, at which point we can do the "type restriction".

I would now have to dive into the code and debug if the "type restriction" for counted loop phi happens purely because of the input values, or because of explicitly restrincting the type of the ConvI2L. But I do see that there is some new ConvI2LNode(input, type) cases where we do restrict the type of a ConvI2L.

eme64 avatar May 13 '24 17:05 eme64

Before split if:

long i = 100;
for (; i > 0;) {
  // i here is 1..100
  int j = (int)i; // ConvL2I type is 1..100, same as loop phi
  int k = 42 / j;
   i--;
}

after split if:

long i = 100;
int j = 100;
int k = 0;
for (; i > 0;) {
  // i here is 1..100
   i--;
  // i here is 0..99
  j = (int)i; // ConvL2I type is still 1..100 which is not correct
  k = 42 / j;
}

rwestrel avatar May 14 '24 07:05 rwestrel

@rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right?

after split if:

long i = 100;
int j = 100;
int k = 0;
for (; i > 0;) {
  // i here is 1..100
   i--;
  // i here is 0..99
  exit check
  // i here is 1..99
  j = (int)i; // ConvL2I type is still 1..100 which is not correct
  k = 42 / j;
}

I guess the issue is that the ConvL2I was somehow pinned inside the loop, after the CountedLoop, by the phi. But when the ConvL2I is split into the backedge, it does not stay in the backedge but floats further, passes by the exit-check and goes into the last iteration -> BOOM.

How exactly did we narrow the type to 1...100? I guess that that is some smart logic in the trip count Phi node, right? If instead we had a CastLL for the exit check that narrows the type, then the CastLL would remain after the split-if, and the split ConvL2I could not float from the backedge into the loop body of the last iteration.

So I guess that is really a limitation: a trip count Phi specifically does the narrowing, and so you cannot just split past it. The question is if that is really nice, or if we could do it differently, e.g. via a CastLL/CastII on the exit-check?

eme64 avatar May 14 '24 07:05 eme64

@rwestrel which "split_if" optimization was applied in your example? Split the ConvI2L through the phi? If so, the problem seems to be that the ConvI2L floats by the exit-check, right?

Yes.

So I guess that is really a limitation: a trip count Phi specifically does the narrowing, and so you cannot just split past it. The question is if that is really nice, or if we could do it differently, e.g. via a CastLL/CastII on the exit-check?

The issue involves conv nodes when split thru phi at a counted loop. That's a narrow corner case. I think fixing it by addressing the corner case where it occurs as proposed is simpler than trying a most general fix which can have hard to anticipate consequences.

rwestrel avatar May 14 '24 08:05 rwestrel

@rwestrel Yes, I'm totally fine with the fix. It simply applies the int case to long.

In a future RFE, we could at least restrict the "bailout" to trip-count Phi's, and not all Phi's.

In even further RFE's, we could consider doing the type narrowing not in the trip-count phi, but via casts at the checks. That would be a more unified solution. Generally, I feel like we are struggling way too much with all the different ways one can pin and narrow types: it is all mixed into trip-count phi's, Cast's, Conv's etc. Who really can understand all the complicated interactions? It seem we keep piling on special-case logic, but it is a endless whack-a-mole game. Every fix is "simple" but the sum of all those fixes is far from "simple" ;)

eme64 avatar May 14 '24 08:05 eme64

FTR, I double checked that fuzzer test failures from JDK-8298851 are indeed the same issue and are fixed with this.

rwestrel avatar May 16 '24 08:05 rwestrel

/integrate

rwestrel avatar May 16 '24 08:05 rwestrel

Going to push as commit f398cd225012694a586e528936159b6df7b1586c. Since your change was applied there have been 160 commits pushed to the master branch:

  • 96c5c3fe75103dc45bc1c3ccce0ab36303121a60: 8329998: Remove double initialization for parts of small TypeArrays in ZObjArrayAllocator
  • ee4a9d34827166ff9ac04e2375058fdc08e43194: 8321622: ClassFile.verify(byte[] bytes) throws unexpected ConstantPoolException, IAE
  • ab8d7b0cedfaae124262325cd1d4b59cef996d85: 8324517: C2: crash in compiled code because of dependency on removed range check CastIIs
  • fe8a2aff3129b515c2a0f3ab96f5e3ad6cef7b70: 8307778: com/sun/jdi/cds tests fail with jtreg's Virtual test thread factory
  • 95f79c678737fb8de9ed45c516761d4d818869ef: 8332253: Linux arm32 build fails after 8292591
  • b687aa550837830b38f0f0faa69c353b1e85219c: 8332176: Refactor ClassListParser::parse()
  • 4083255440cfbf39b9683ea88a433d71ec6111e7: 8316138: Add GlobalSign 2 TLS root certificates
  • 43b109b111e77d0f7b302debc0d76e4ac7c9ac56: 8330066: HeapDumpPath and HeapDumpGzipLevel VM options do not mention HeapDumpBeforeFullGC and HeapDumpAfterFullGC
  • 7cff04fc8a8114a297437aa526b18b6185831eac: 8330276: Console methods with explicit Locale
  • 8a4315f833f3700075d65fae6bc566011c837c07: 8331987: Enhance stacktrace clarity for CompletableFuture CancellationException
  • ... and 150 more: https://git.openjdk.org/jdk/compare/7a41a525deb796396ade1456f1d0a101ac705014...master

Your commit was automatically rebased without conflicts.

openjdk[bot] avatar May 16 '24 08:05 openjdk[bot]

@rwestrel Pushed as commit f398cd225012694a586e528936159b6df7b1586c.

:bulb: You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

openjdk[bot] avatar May 16 '24 08:05 openjdk[bot]