problem-solving icon indicating copy to clipboard operation
problem-solving copied to clipboard

LEAVE phaser with return

Open patrickbkr opened this issue 1 year ago • 46 comments

What should happen when a LEAVE phaser contains a return statement?

Given these four snippets, what should each of those print?

sub s() {
    LEAVE return 5;
    return 7;
}
say s()
sub s(&code) {
    LEAVE return 5;
    code();
}
sub s2() {
    s({
        return 7;
    });
    return 1;
}
say s2()
sub s() {
    LEAVE return 5;
    7;
}
s()
sub s(&code) {
    LEAVE return 1;
    code();
}
sub s2() {
    s({
        return 7;
    });
}
s2()

patrickbkr avatar Jan 16 '24 13:01 patrickbkr

With https://github.com/MoarVM/MoarVM/pull/1785 and https://github.com/MoarVM/MoarVM/pull/1782 merged current rakudo prints:

  1. 5
  2. 1
  3. 5
  4. 1

I especially wonder about 2. and 4. It's comprehensible what's happening and why, but feels like a WAT. Maybe it's simply a DIHWIDT, similar to what next does here:

sub s() { next; }
for ^10 { s(); say $_; }

patrickbkr avatar Jan 16 '24 13:01 patrickbkr

In my opinion a return in a LEAVE phaser shouldn't be different from a return in any other phaser, e.g. ENTER or FIRST. With the new Raku grammar, FIRST appears to be doing that apparently already:

sub a { FIRST return 42 }; dd a;  # 42

So in my book, the return values would be: 5, 1, 5, 1

lizmat avatar Jan 16 '24 13:01 lizmat

Or we should disallow return from phasers, period. And make it report an appropriate execution error (or possibly a compile time error in RakuAST).

lizmat avatar Jan 16 '24 14:01 lizmat

There is a problem with cases 2 and 4. Both has to be 7. And here is why.

Return from a block applies to the closure where the block belongs to. This is why .map({ return 7 }) works as expected. From this point of view it doesn't matter what happens inside of sub s and what it returns – but return within the block must win.

Or we should disallow return from phasers, period.

Better not.

vrurg avatar Jan 16 '24 15:01 vrurg

Ok, thinking about this more: I change my answers to 5, 5, 5, 1. And here's why:

In case 2, since &code is a Block, the return is applied to s and not to s2. However, that return implies that it is leaving the scope of s, and thus its LEAVE phaser will fire causing it to return 5.

I don't see an issue with 4. The same logic as 2. applies here.

lizmat avatar Jan 16 '24 17:01 lizmat

@lizmat you're wrong. Letting a block return from s2 via invoking &code makes the code in s2 unpredictable and will require additional complications like checking if it's a block or not in order to prevent the block from breaking our behavior. Not to mention that it wouldn't be backward compatible:

sub foo(&c) { 
    my \rc = &c(); 
    say "foo rc=", rc.raku; # no output
    rc 
} 
sub bar() { 
    foo { return 1 }; 
    return 2  
}
say bar; # 1

I will remain in the position that a block must return from its closure.

vrurg avatar Jan 16 '24 19:01 vrurg

I think it will help to explain what Rakudo currently does technically. Looking at the 2nd example:

sub s(&code) {
    LEAVE return 5;
    code();
}
sub s2() {
    s({
        return 7;
    });
    return 1;
}
say s2()
  • s2 is entered
  • s is entered
  • &code is entered
  • return 7 is called
  • throwpayloadlexcaller is called (https://github.com/MoarVM/MoarVM/blob/main/src/core/interp.c#L4908C3-L4908C3)
  • A CONTROL handler is searched in the lexical scope of return 7. (MVM_exception_throwpayload and then search_for_handler_from)
  • The handler of s2 is found and a frame unwind (following the caller chain) is started targetting the handler in s2 (MVM_frame_unwind_to)
  • During that unwind the following two frames are unwound in order: {return 7}, s()
  • The exit handler of the s() frame is seen. The unwind process is paused (not aborted, only paused), and the LEAVE block is executed.
  • return 5 is called
  • throwpayloadlexcaller is called
  • A CONTROL handler is searched in the lexical scope of return 5.
  • The handler of s is found and a frame unwind (following the caller chain) is started targetting the handler in s
  • During that unwind only the LEAVE return 5 frame is unwound.
  • The handler in s() is reached. s() returns and hands the value 5 to s2().
  • Execution continues in s2().
  • The returned value 5 is sunk.
  • return 1 is called. Same process as above.
  • 1 is sayed

I think the behavior is consistent in that once I understood that &returns are effectively exceptions that search handlers in the outer chain and then unwind the caller chain, the resulting behavior is very comprehensible and expected. Departure from the described behavior will mean adding additional logic to the behavior of stack unwinding. So there would be more logic to know about to fully grasp what's happening.


@vrurg I believe the above behavior is interesting, because your requirement

Return from a block applies to the closure where the block belongs to.

is upheld. (I strongly support this requirement.) But the return 5 in s still manages to interrupt the unwind of return 7 with lasting effect.


Just to state a possibility of how we could adapt the above logic: We detect the situation where a return (return 5 in the above code) interrupts the unwind of some other return (return 7). Then

if both exceptions are CONTROL {
    if target handlers of both exceptions are equal { # This means the returns are in the same lexical scope
        ... do interrupt the first unwind and proceed with the new return value
    }
    else {
        ... let the first unwind continue, ignore the second `return`
    }
}

This would have the consequence that the results would be:

  1. 5
  2. 7
  3. 5
  4. 7

I'm undecided if I like this or not. It limits action at a distance: The return 5 in s doesn't change the effect of return 7 in s2. But it also adds action at a distance: The return 7 in s2 does change the behavior of the return 5 in s.

patrickbkr avatar Jan 17 '24 10:01 patrickbkr

All of this explanation that is needed to explain behaviour, is convincing me that a return should NOT be allowed in a phaser. Sadly, there is no way to reliably enforce that at compile time.

But we could at run-time by codegenning a CONTROL block in any phaser block. Conceptually:

LEAVE {
    say "bye";
    return 42;
}

would be codegenned as:

LEAVE {
    CONTROL {
        die "return within a LEAVE phaser" if $_ ~~ CX::Return
    }
    say "bye";
    return 42;
}

Of course, if the return were to be directly in the phaser's body, we could make that a compile time error as well.

Since phasers cannot be inlined anyway, I don't think this will affect performance of phasers much.

lizmat avatar Jan 17 '24 12:01 lizmat

To actually not break any code out there that is depending in old "return in phaser" behaviour, we could limit this CONTROL codegenning to 6.e.

lizmat avatar Jan 17 '24 12:01 lizmat

I'm undecided if I like this or not. It limits action at a distance: The return 5 in s doesn't change the effect of return 7 in s2. But it also adds action at a distance: The return 7 in s2 does change the behavior of the return 5 in s.

What would it change in return 5 behavior? The returned value is not going to be used by definition. So, we can sink it immediately and for the code in s it will go unnoticed.

The noticeable change is that immediate termination of return procedures will prevent a potential CONTROL block in s from being invoked. Perhaps, this can be easy to fix by adding a special if branch which would try the path of seeking for the CONTROL block in the lexical scope without initiating a new unwind.

So, I currently see no cardinal changes that might break s semantics or user expectations about it.

All of this explanation that is needed to explain behaviour, is convincing me that a return should NOT be allowed in a phaser.

While not being fully against this prohibition, I'd not appreciate it. Besides, consider that CATCH and CONTROL are phasers too. Returning a value from CATCH is often an important way to handle an error.

Sadly, there is no way to reliably enforce that at compile time.

Again, hope this won't be needed. But technically, an explicit return can be detected as soon as:

  1. the current lexical context belongs to a phaser (the compiler knows it)
  2. the current lexical scope doesn't belong to a routine (pointy block included)

Both being True means this return is going to cause a trouble.

But we could at run-time by codegenning a CONTROL block in any phaser block.

Even if we need it, I wouldn't be happy about extra implicit code injection. A VM can be more efficient in implementing this. And more universal.

But either way, we haven't considered all possibilities of optimizing the case. The fact that any return within frames, affected by unwind, is not going to be used makes a lot of things easier, as I see it. All we need is to make sure all related CONTROL blocks are fired up.

vrurg avatar Jan 17 '24 15:01 vrurg

Adding case 5.

sub s(&code) {
    return 0;
    LEAVE {
        code()
    }
}

sub s2() {
    s({
        return 1;
    });
    return 2;
}

say s2();

This is in some sense the inverse of case 2. (I don't see how we can distinguish the two cases, i.e. to make case 2. give 7 and case 5. give 1.)

Current Rakudo (with the two PRs applied) prints 1.

patrickbkr avatar Jan 19 '24 21:01 patrickbkr

Adding case 5.

It makes no difference with 2. return 1 wins, so it makes 1 eventually.

More interesting the case of s itself. Say, &code is not a block but a routine returning a value. Let it be the 1 we already return. Then we have two options:

  1. no explicit return makes the LEAVE block mute and only the side effects of it are what's important
  2. since LEAVE is the last statement and is executed after the return 0 its return value is the return value of s

But the second option loses for two reasons. First, the return 0 statement is explicit. Second, it would make it harder to learn the rules governing returns from routines.

vrurg avatar Jan 20 '24 03:01 vrurg

@vrurg I try to explain some more why I think case 2. vs case 5. is difficult.

case 2.

sub s(&code) {            # target of second return
    LEAVE return 5;       # second return -> collision
    code();
}
sub s2() {                # target of first return
    s({ return 7 });      # first return
    return 1;
}
s2()

case 5.

sub s(&code) {          # target of first return
    return 0;           # first return
    LEAVE { code() }
}

sub s2() {              # target of second return
    s({ return 1 });    # second return -> collision
    return 2;
}
s2()

Let's consider the point during execution marked with "collision".

In case 2 that's return 5. At that point some other return is already in progress (return 7). That other return targets a different routine (s2()).

In case 5 that's return 1. At that point some other return is already in progress (return 0). That other return targets a different routine (s()).

When working with the information given in the previous two paragraphs, the outcome is either 5 and 1 or 7 and 0. I don't see what additional information could be used to help.

patrickbkr avatar Jan 20 '24 08:01 patrickbkr

Hope, we're not discussing different things here. Also hope that you're not looking at the problem from the perspective of implementation.

Separate the concerns. Separate the sources of s and s2. Best if you separate them into modules from different developers. Now, from the point of view of s2, should it care about what happens inside s? It clearly should not. Because in either case 2 or case 5, the block submitted to s returns from s2. This is what I would see as a developer of s2.

Think of it from the day-to-day development. The author of s must remember to document that it's sub is using a LEAVE phaser which is using return. I matters not if one or the other is not used, but when together – we need to document it.

The author of s2 must remember, that s has a phaser which is using a return. No other routine from this module doesn't, but this one does. Ah, and that the return in LEAVE only fires if this, and this, and, perhaps, that condition are all met. And then there is a couple of other modules, and a routine in one of them has similar behavior. But which one??

That's why I say 2 and 5 are the same. Because all what matters here is s2 as the consumer.

vrurg avatar Jan 20 '24 15:01 vrurg

Also hope that you're not looking at the problem from the perspective of implementation.

Maybe "inspired by"... Can't prevent that.

Separate the concerns. Separate the sources of s and s2. Best if you separate them into modules from different developers. Now, from the point of view of s2, should it care about what happens inside s? It clearly should not. Because in either case 2 or case 5, the block submitted to s returns from s2. This is what I would see as a developer of s2.

I see what angle you are coming from and I agree.

I just have difficulty mapping what that means technically. After some thinking: Always let the return win, that wants to unwind further down the stack. Could that be it?

patrickbkr avatar Jan 20 '24 21:01 patrickbkr

I just have difficulty mapping what that means technically. After some thinking: Always let the return win, that wants to unwind further down the stack. Could that be it?

I would tentatively agree. Not sure if we miss no nuance here.

vrurg avatar Jan 21 '24 01:01 vrurg

I have added an implementation of the "deeper unwind wins" approach in https://github.com/MoarVM/MoarVM/pull/1786. Spec tests pass including those in https://github.com/Raku/roast/pull/851

patrickbkr avatar Jan 21 '24 18:01 patrickbkr

I just have difficulty mapping what that means technically. After some thinking: Always let the return win, that wants to unwind further down the stack. Could that be it? Not sure if we miss no nuance here.

We do. Always picking the return that unwinds deeper basically disables &return everywhere in respective phasers. That messes up all but the simplest code. It's a miracle that PR managed to get the spec tests to pass.

patrickbkr avatar Jan 25 '24 21:01 patrickbkr

Another idea:

  • Before calling any exit handler during unwind, we mark the target frame as this frame is returning and save the return value in the target frame (in frame->extra->exit_handler_result). We could also do that right away on every &return, but that would make all &returns pay the performance penalty.
  • On every implicit return (that's all returns, not just &return), check the caller frame for the returning marker. If found, then unwind the current frame, restore the previously saved value and do another unwind. That's the equivalent of two implicit returns in a row (+ restoring the right return value).

I think the semantics are correct and the approach to have frames in a "returning" state feels like it matches the core of the issue.

I see issues though:

  • We add one if to every (implicit and explicit) return. This is not good for performance. Probably negligible, but I test this yet.
  • From an opcode perspective, is this behavior too much magic for throwpayloadlexcaller and the return_*?
  • We make return fundamentally more complex. Returning is involved in inlining and specialization. This makes all of that more complex. I think this point is the most problematic one.

patrickbkr avatar Jan 25 '24 22:01 patrickbkr

Yet another idea: We tune the target frames return_address to point at the return handler. To retrieve the correct value, we'll change lastexpayload to check for the frame->returning flag and if set, retrieve the value saved in frame->extra->exit_handler_result instead of tc->last_payload.

patrickbkr avatar Jan 26 '24 22:01 patrickbkr

I have implemented the last idea in https://github.com/MoarVM/MoarVM/pull/1788, it turned out rather nice. I have also added another test to https://github.com/Raku/roast/pull/851. It passes all those tests. I haven't done a spec test (need to add a JIT implementation to https://github.com/MoarVM/MoarVM/pull/1788 first), but this looks very promising.

patrickbkr avatar Jan 27 '24 00:01 patrickbkr

JIT is in. So we now have a working implementation. How do we proceed? I.e. how do we reach a conclusion?

I see two possible options:

  1. The new semantics (summarized as: the ~~first return~~ last return in a given routine always wins) are accepted.
  2. We keep the current semantics (return exceptions are not prioritized, last called wins).

I tend to not like:

  1. Prohibit &return in phasers. (Even though I was the one to originally propose it.)

patrickbkr avatar Jan 27 '24 17:01 patrickbkr

Ping @vrurg, @lizmat.

patrickbkr avatar Feb 25 '24 13:02 patrickbkr

To summarize: there's an implementation now that implements: "first return wins", and it's spectest clean?

lizmat avatar Feb 25 '24 13:02 lizmat

To summarize: there's an implementation now that implements: "first return wins", and it's spectest clean?

~~Yes.~~ <-- That was wrong. It's last return in a routine that wins.

patrickbkr avatar Feb 25 '24 14:02 patrickbkr

The PR is still marked as draft. Maybe it shouldn't? :-)

lizmat avatar Feb 25 '24 14:02 lizmat

I can change that. But: We should decide if those semantics are what we actually want. From the above discussion I don't have the impression that we have reached a consensus yet.

So, can we first decide that we want those semantics, then go forward with the implementation. Right?

On February 25, 2024 3:13:45 PM GMT+01:00, Elizabeth Mattijsen @.***> wrote:

The PR is still marked as draft. Maybe it shouldn't? :-)

-- Reply to this email directly or view it on GitHub: https://github.com/Raku/problem-solving/issues/417#issuecomment-1962954284 You are receiving this because you authored the thread.

Message ID: @.***>

patrickbkr avatar Feb 25 '24 14:02 patrickbkr

+1 for first return wins.

We already have a similar behaviour regarding exit:

$ raku -e 'END exit 42; exit 1' 
$ echo $?
1

lizmat avatar Feb 25 '24 15:02 lizmat

I know I'm late to the party, but anyway.

S06 says "The C<return> function notionally throws a control exception that is caught by the current lexically enclosing C<Routine> to force a return through the control logic code of any intermediate block constructs. (That is, it must unwind the stack of dynamic scopes to the proper lexical scope belonging to this routine.)"

It says this about ENTER/LEAVE/KEEP/UNDO/etc. phasers: "These phasers supply code that is to be conditionally executed before or after the subroutine's C<do> block (only if used at the outermost level within the subroutine; technically, these are added to the block traits on the C<do> block, not the subroutine object). These phasers are generally used only for their side effects, since most return values will be ignored. (Phasers that run before normal execution may be used for their values, however.)"

The first quoted paragraph makes me inclined to suggest "lastest wins" semantics for return. After all return throws a control exception and with exceptions in general it's that any exception occurring after the original exception (e.g. in the exception handler) will become the dominant one that get's passed up to handlers. This matches what I would expect.

But then S06 is quite explicit about phasers, especially LEAVE being about their side effects. So now I'm a bit torn.

niner avatar Feb 25 '24 19:02 niner

being about their side effects

Could also be interpreted as "don't allow return in a phaser" ?

lizmat avatar Feb 25 '24 19:02 lizmat