openj9 icon indicating copy to clipboard operation
openj9 copied to clipboard

Loom: Add support for direct transition to interpreter on continuation entry

Open tajila opened this issue 2 years ago • 23 comments

The current implementation creates a new interpreter instance on continuation entry (resuming or a creating a new one). This is going to eat up stack space.

To address this, a new callin helper will be added to (setupRunStaticMethod) which resolves the method (and init class), builds a frame and sets up the args but doesnt instanstiate a new interpreter.

THe continuation enter code will be amend to transition back to the interpreter with the new interpreter variables rather than creating a new instance

tajila avatar Jun 30 '22 13:06 tajila

This won't work as described above. A call-in requires a new interpreter. If the proposed code is just going to fiddle with the stack, it has no business being a call-in.

If you've called out to a native, a call-in is the only appropriate solution without some massive hacks in the interpreter (and JIT if the native can be directly called).

gacholio avatar Jun 30 '22 18:06 gacholio

I think we want these to be INLs rather jni natives so we can handle them specially in the interpreter and maybe with fast JNI.

If we're in the interpreter, then most of the operation is swaping J9JavaStacks and updating the vm registers. I'm not sure why we'd need a new interpreter instance for this.

DanHeidinga avatar Jun 30 '22 19:06 DanHeidinga

That would avoid some issues but not others. Without the out/in frames, it will be legal to throw an exception out of the continuation or drop to frame out of it.

Fast JNI is also unlikely since it would be difficult to manage transitions (we would not be able to simply copy the stack, we'd need to massage the frames at continuation entry point to handle the entry being compiled or not. The frames would need to be examined/modified on every mount (is the final frame in the carrier thread compiled or not...).

gacholio avatar Jun 30 '22 19:06 gacholio

Remind me again - when a vthread is mounted, it replaces the carrier thread stack entirely (as opposed to being mounted "on top" of the carrier thread stack)?

That fixes a bunch of the problems above - we can handle an exception being thrown out of the vthread because it will presumably have a call-in frame at the start.

gacholio avatar Jul 05 '22 17:07 gacholio

Also need to think on ELS pointers in the stacks as they are swapped in and out.

gacholio avatar Jul 05 '22 17:07 gacholio

Remind me again - when a vthread is mounted, it replaces the carrier thread stack entirely (as opposed to being mounted "on top" of the carrier thread stack)?

It replaces the stack

tajila avatar Jul 05 '22 18:07 tajila

Also need to think on ELS pointers in the stacks as they are swapped in and out.

Im not sure we should be doing that, but it seems to work on Jacks latest patch. The ELS is native stack allocated, which implies that if we are saving this on the continuation struct, that continuation must be run on the same carrier thread on which that ELS is allocated on, but we cannot guarantee this.

tajila avatar Jul 05 '22 18:07 tajila

Yes, we need to fix up the ELS on switch, and possibly preserve the ELS state in the swapped-out stack (ELS may contain saved JIT registers, etc).

I think there can only be the initial ELS active (due to pinning on call-out).

Another thing we need to consider is call-in for resolve. This is not a JNI callout, so we would not be pinning in the current implementation. Easiest short-term solution is to increment the pin count when we do this. Long term I'm not sure.

Aside: Can we put the same pinning restrictions on the carrier thread as we do virtuals? The proposed implementation pretty much requires this.

gacholio avatar Jul 05 '22 18:07 gacholio

Aside: Can we put the same pinning restrictions on the carrier thread as we do virtuals? The proposed implementation pretty much requires this.

as in: "a carrier thread cannot mount a vthread if it has called-out, entered a synch block, etc." ?

tajila avatar Jul 05 '22 20:07 tajila

I think there can only be the initial ELS active (due to pinning on call-out).

The initial ELS needs some awareness of the last carrier thread ELS for things like currentOSStackFree calculations to work

tajila avatar Jul 05 '22 20:07 tajila

We'll need to copy most of the ELS in/out and maintain a copy of it for the unmounted stack walk (much like the other roots). The stack overflow stuff is not needed for stack walk, so we can just keep those pointers as they are when we do the ELS swap.

This will mean creating an ELS when we create the java stack for a vthread.

gacholio avatar Jul 05 '22 22:07 gacholio

I'm confusing myself (and possibly others) - the ELS contains pointers to the C stack where the FPR/GPRs are stored for the JIT. It's the contents of the save area that needs to be swapped in and out, not the ELS itself.

gacholio avatar Jul 06 '22 16:07 gacholio

The proposal from today's meeting is that we allocate a register save area for use by the initial ELS in the continuation run and then we will not need to swap the values in and out (and it will make walking the stacks a bit simpler).

The JIT helpers will need to be modified to spill registers via the ELS pointers to the save area rather than directly into the current native stack frame. I will investigate this.

Also, we'll need a special call-in helper which accepts the new save area pointers and initializes the ELS correctly (will also require modification of the hand-coded interpreter wrappers to not overwrite these fields).

gacholio avatar Jul 07 '22 18:07 gacholio

I will do the design, but hope someone else will do the implementation.

gacholio avatar Jul 19 '22 17:07 gacholio

Sounds good, I'll assign to Jack

tajila avatar Jul 19 '22 17:07 tajila

New proposal

The carrier J9VMThread will always represent the stack that's running on the platform thread (carrier or mounted continuation). Because the ELS pointers are into the C stack which is tied to the platform thread, the ELS pointer in the J9VMThread will not be changed when mounting/unmounting. This is required because we will re-use the latest C interpreter stack frame directly when running a continuation - we will not call in to a new interpreter for the initial continuation run.

When a continuation is created, the java stack will be created with the normal empty native method frame that every thread gets followed by a call-in frame. Note there is no actual call in, just the frame to keep the stack walker sane. The call in frame will never be returned to.

To run a continuation for the first time, we will no longer call in. Instead, after building the INL frame for the native, we'll swap roots with the J9VMContinuation fields and return from the INL C code with an instruction to run the continuation wrapper method which is essentially:

   Mark continuation started
   try {
      Execute continuation
   } finally {
      Mark continuation ended
      yield
   }

The yield native can perform different actions depending on whether the continuation is ended or not.

The root values in the J9VMContinuation structure will represent whichever stack is NOT running on the platform thread. New fields will be required:

   J9JITGPRSpillArea registerSaveArea;
   J9VMEntryLocalStoarge *oldELS;

When swapping roots, the contents of registerSaveArea must also be swapped with the contents of ELS->jitGlobalStorageBase. oldELS will be set to the value of vmThread->entryLocalStorage->oldEntryLocalStorage. Note that the contents of ELS structures on the C stack will never be modified by the swap - this allows native stack overflow detection to continue working without modification. oldELS is required in order to support the carrier thread having performed another call-in before processing continuations.

To walk the stack in the J9VMContinuation, create an J9VMEntryLocalStorage on the stack, zero it, and initialize the oldEntryLocalStorage field to cont->oldELS and the jitGlobalStorageBase field to &cont->registerSaveArea. Then create a J9VMThread on the stack, zero it, and fill in all of the root values from the J9VMContinuation as we do today, additionally pointing entryLocalStorage to the on-stack ELS.

Remounting and yielding will perform the same swap as running the initial continuation (with yield perhaps doing something for the continuation ended case). When resuming execution after the swap, we must be aware of which INL native frame is on stack and return (collapsing the INL frame) with the appropriate value and number of arguments.

gacholio avatar Jul 22 '22 18:07 gacholio

Note that I have omitted swapping the FPR spill area because we are always at an INL when we swap, so there can be no live JIT FPRs (no FPRs are preserved in the JIT private linkage) and it is potentially very large due to new vector register support.

gacholio avatar Jul 22 '22 18:07 gacholio

ELS also has some ZOS-specific stuff for CEE handler. I believe this will also just work because we're leaving the on-stack ELS structures alone when we swap, but we should be aware of potential issues here.

gacholio avatar Jul 22 '22 18:07 gacholio

oldELS should really only be set when the J9VMContinuation represents the carrier thread. While it's harmless for the continuation ELS to link back, it's unncessarily confusing. It should be set to NULL when the struct represents and unmounted continuation.

gacholio avatar Jul 22 '22 20:07 gacholio

As there will be multiple callers needing to walk a continuation stack, I suggest we create a wrapper for walkStackFrames that encapsulates the stack-allocated thread and ELS set up.

gacholio avatar Jul 25 '22 16:07 gacholio

Correction to above - when the continuation stack is created, it should contain only the empty native method frame created by initializeExecutionModel. The call-in frame should be created on first mount before executing the initial method. This is required since the stack is not technically walkable with a call-in frame on top of stack.

Edit: Turns out this isn't actually true, but I would still like it done this way for consistency with normal threads.

gacholio avatar Jul 25 '22 18:07 gacholio

@gacholio It appears that this solution (enter/yield) is interpreter only, as in, unimplementable in the JIT. Is this true? I just want to get an understanding of the limitations.

tajila avatar Jul 28 '22 13:07 tajila

The JIT will transition to the INLs to perform the mount/yield. There's far too much going on in the natives for the JIT to inline it.

gacholio avatar Jul 28 '22 15:07 gacholio