jdk icon indicating copy to clipboard operation
jdk copied to clipboard

8372591: assert(!current->cont_fastpath() || freeze.check_valid_fast_path()) failed

Open pchilano opened this issue 1 month ago • 6 comments

When a frame is OSR and the caller is interpreted we call push_cont_fastpath to possible set _cont_fastpath to the sp of the sender. This is needed in order to keep track of interpreted frames in the stack, so that in case of unmounting we don't incorrectly execute the freeze fast path, which is only allowed when all frames are compiled.

The problem is that the OSR frame is created starting from the unextended_sp of the sender, not the sp, i.e we pop the interpreted frame like in remove_activation. This means that we could set a value to _cont_fastpath that will point outside the valid stack (below the sp of the OSR frame). If the gap between these stack addresses is big enough, _cont_fastpath could be later cleared while we still have the interpreter sender in the stack, leading to the reported crash. The simplest case where this will happen is if in a later call to yield all the extra frames leading to Continuation.doYield fit within this space [1]. It could also be cleared before that if for example at some point we returned from an interpreted callee and the sender sp it's not below _cont_fastpath[2].

The fix is to change OSR_migration_begin to set _cont_fastpath with the sender's unextended_sp instead.

The patch includes new test OSRWithManyLocals.java which reliably reproduces the crash. I thought about adding it to the existing OSRTest.java but that was created to exercise a different issue. A better name for that test would be OSRWithManyArgs.java, so I could rename it in this patch if preferred (I also realized that test could be simplified and made easier to read but that's for another PR).

I tested the current patch with the new test and also run it through mach5 tiers1-7.

Thanks, Patricio


Progress

  • [ ] Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • [x] Change must not contain extraneous whitespace
  • [x] Commit message must refer to an issue

Issue

  • JDK-8372591: assert(!current->cont_fastpath() || freeze.check_valid_fast_path()) failed (Bug - P3)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28830/head:pull/28830
$ git checkout pull/28830

Update a local copy of the PR:
$ git checkout pull/28830
$ git pull https://git.openjdk.org/jdk.git pull/28830/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28830

View PR using the GUI difftool:
$ git pr show -t 28830

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28830.diff

Using Webrev

Link to Webrev Comment

pchilano avatar Dec 15 '25 16:12 pchilano

:wave: Welcome back pchilanomate! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

bridgekeeper[bot] avatar Dec 15 '25 16:12 bridgekeeper[bot]

@pchilano This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8372591: assert(!current->cont_fastpath() || freeze.check_valid_fast_path()) failed

Reviewed-by: dholmes, alanb, rrich, fyang

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 106 new commits pushed to the master branch:

  • 9435d5b89ca08595f0f2f8d029c00bc6d1f30104: 8346154: [XWayland] Some tests fail intermittently in the CI, but not locally
  • 25e87144c20fcf5aca99b92f061a0051096c2605: 8369515: Deadlock between JVMTI and JNI ReleasePrimitiveArrayCritical
  • 8ab7d3b89f656e5c2882e19065f01fcc434161d2: 8374078: C2_MacroAssembler::verify_int_in_range has incorrect early return condition
  • ... and 103 more: https://git.openjdk.org/jdk/compare/99f90befafe9476de17e416d45a9875569171935...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk[bot] avatar Dec 15 '25 16:12 openjdk[bot]

@pchilano The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

openjdk[bot] avatar Dec 15 '25 16:12 openjdk[bot]

/label remove core-libs

pchilano avatar Dec 15 '25 17:12 pchilano

@pchilano The core-libs label was successfully removed.

openjdk[bot] avatar Dec 15 '25 17:12 openjdk[bot]

Webrevs

mlbridge[bot] avatar Dec 15 '25 18:12 mlbridge[bot]

This seems to be a "day one" bug with virtual threads. Do you have an idea as to why it has not been noticed before?

dholmes-ora avatar Dec 16 '25 04:12 dholmes-ora

This seems to be a "day one" bug with virtual threads. Do you have an idea as to why it has not been noticed before?

Yes, it's a day one bug. There are a couple of conditions that need to align to trigger it, but the most likely reason it went undetected is the requirement for a big enough difference between the unextended_sp and sp of the interpreted sender.

pchilano avatar Dec 16 '25 20:12 pchilano

The patch includes new test OSRWithManyLocals.java which reliably reproduces the crash. I thought about adding it to the existing OSRTest.java but that was created to exercise a different issue.

I think what you have is good but maybe me wonder if there might be other corner cases that might need further Continuations tests.

AlanBateman avatar Dec 17 '25 06:12 AlanBateman

The patch includes new test OSRWithManyLocals.java which reliably reproduces the crash. I thought about adding it to the existing OSRTest.java but that was created to exercise a different issue.

I think what you have is good but maybe me wonder if there might be other corner cases that might need further Continuations tests.

I can't think of others looking at the code. But OSRTest.java covers many different OSR scenarios already with methods that have many parameters.

pchilano avatar Dec 17 '25 16:12 pchilano

Thanks for the reviews David and Alan!

pchilano avatar Dec 17 '25 16:12 pchilano

@reinrich @TheRealMDoerr Could you verify if this fix is correct for ppc too? Thanks.

pchilano avatar Dec 17 '25 16:12 pchilano

Thanks for the ping! A quick run of the test has passed on PPC64. We'll run more tests. @reinrich may want to take a look, too.

TheRealMDoerr avatar Dec 17 '25 20:12 TheRealMDoerr

@reinrich @TheRealMDoerr Could you verify if this fix is correct for ppc too? Thanks.

The fix looks correct. Strangely the reproducer didn't work on ppc. Not even with 3x the locals in the osr method. I'll look a little more into it...

reinrich avatar Dec 18 '25 00:12 reinrich

@reinrich @TheRealMDoerr Could you verify if this fix is correct for ppc too? Thanks.

The fix looks correct. Strangely the reproducer didn't work on ppc. Not even with 3x the locals in the osr method. I'll look a little more into it...

On ppc we reach OSR_migration_begin with _cont_fastpath that's higher than sender.sp() so push_cont_fastpath(sender.sp()) has no effect. I'm not sure why this is. It might be due to the trimming with i2i calls where the sender is trimmed to a parent frame (the top frame always has room for max_stack). The sender is a lambda. Maybe it has done an i2c call as top frame (i.e. untrimmed) which set _cont_fastpath?

If max_stack of the sender is very large then, due to trimming, unextended_sp < sp is possible and the assertion could also fail.

Maybe the maximum of unextended_sp and sp could be used?

Note that on ppc frame::id() returns the fp. Maybe this should be used as _cont_fastpath. Needs more investigation...

reinrich avatar Dec 18 '25 19:12 reinrich

@reinrich @TheRealMDoerr Could you verify if this fix is correct for ppc too? Thanks.

The fix looks correct. Strangely the reproducer didn't work on ppc. Not even with 3x the locals in the osr method. I'll look a little more into it...

On ppc we reach OSR_migration_begin with _cont_fastpath that's higher than sender.sp() so push_cont_fastpath(sender.sp()) has no effect. I'm not sure why this is. It might be due to the trimming with i2i calls where the sender is trimmed to a parent frame (the top frame always has room for max_stack). The sender is a lambda. Maybe it has done an i2c call as top frame (i.e. untrimmed) which set _cont_fastpath?

Maybe, but I don't see which compiled method it could be, since methods before foo should be interpreted.

If max_stack of the sender is very large then, due to trimming, unextended_sp < sp is possible and the assertion could also fail.

Maybe the maximum of unextended_sp and sp could be used?

On x64 and AArch64 we build the OSR frame starting from the sender's unextended_sp (modulo aligment). Is that not the case for ppc?

pchilano avatar Dec 18 '25 20:12 pchilano

If max_stack of the sender is very large then, due to trimming, unextended_sp < sp is possible and the assertion could also fail. Maybe the maximum of unextended_sp and sp could be used?

On x64 and AArch64 we build the OSR frame starting from the sender's unextended_sp (modulo aligment). Is that not the case for ppc?

EDIT:

Yes it is. And I get it now: the OSR frame is built starting from the sender's unextended_sp. It doesn't matter if it is below sender's sp. It will surely be within the valid stack (above the sp of the OSR frame).

OLD Comment:

It is. The difference to x86 is, that the unextended_sp can be (much lower) than the sp because unextended_sp has room for the maximal size of the expression stack. This diagram shows this.

Actually aarch64 seems to be similar. I think @dean-long told me once that there an interpreter frame also has room for the maximal expression stack (see generate_fixed_frame) and that it get's trimmed by an interpreted callee. What's done just before calling generate_fixed_frame looks like trimming of the caller frame.

reinrich avatar Dec 19 '25 14:12 reinrich