LEAVE phaser with return
What should happen when a LEAVE phaser contains a return statement?
Given these four snippets, what should each of those print?
sub s() {
LEAVE return 5;
return 7;
}
say s()
sub s(&code) {
LEAVE return 5;
code();
}
sub s2() {
s({
return 7;
});
return 1;
}
say s2()
sub s() {
LEAVE return 5;
7;
}
s()
sub s(&code) {
LEAVE return 1;
code();
}
sub s2() {
s({
return 7;
});
}
s2()
With https://github.com/MoarVM/MoarVM/pull/1785 and https://github.com/MoarVM/MoarVM/pull/1782 merged current rakudo prints:
- 5
- 1
- 5
- 1
I especially wonder about 2. and 4. It's comprehensible what's happening and why, but feels like a WAT. Maybe it's simply a DIHWIDT, similar to what next does here:
sub s() { next; }
for ^10 { s(); say $_; }
In my opinion a return in a LEAVE phaser shouldn't be different from a return in any other phaser, e.g. ENTER or FIRST. With the new Raku grammar, FIRST appears to be doing that apparently already:
sub a { FIRST return 42 }; dd a; # 42
So in my book, the return values would be: 5, 1, 5, 1
Or we should disallow return from phasers, period. And make it report an appropriate execution error (or possibly a compile time error in RakuAST).
There is a problem with cases 2 and 4. Both has to be 7. And here is why.
Return from a block applies to the closure where the block belongs to. This is why .map({ return 7 }) works as expected. From this point of view it doesn't matter what happens inside of sub s and what it returns – but return within the block must win.
Or we should disallow
returnfrom phasers, period.
Better not.
Ok, thinking about this more: I change my answers to 5, 5, 5, 1. And here's why:
In case 2, since &code is a Block, the return is applied to s and not to s2. However, that return implies that it is leaving the scope of s, and thus its LEAVE phaser will fire causing it to return 5.
I don't see an issue with 4. The same logic as 2. applies here.
@lizmat you're wrong. Letting a block return from s2 via invoking &code makes the code in s2 unpredictable and will require additional complications like checking if it's a block or not in order to prevent the block from breaking our behavior. Not to mention that it wouldn't be backward compatible:
sub foo(&c) {
my \rc = &c();
say "foo rc=", rc.raku; # no output
rc
}
sub bar() {
foo { return 1 };
return 2
}
say bar; # 1
I will remain in the position that a block must return from its closure.
I think it will help to explain what Rakudo currently does technically. Looking at the 2nd example:
sub s(&code) {
LEAVE return 5;
code();
}
sub s2() {
s({
return 7;
});
return 1;
}
say s2()
s2is enteredsis entered&codeis enteredreturn 7is calledthrowpayloadlexcalleris called (https://github.com/MoarVM/MoarVM/blob/main/src/core/interp.c#L4908C3-L4908C3)- A CONTROL handler is searched in the lexical scope of
return 7. (MVM_exception_throwpayload and then search_for_handler_from) - The handler of
s2is found and a frame unwind (following the caller chain) is started targetting the handler ins2(MVM_frame_unwind_to) - During that unwind the following two frames are unwound in order:
{return 7},s() - The exit handler of the
s()frame is seen. The unwind process is paused (not aborted, only paused), and theLEAVEblock is executed. return 5is calledthrowpayloadlexcalleris called- A CONTROL handler is searched in the lexical scope of
return 5. - The handler of
sis found and a frame unwind (following the caller chain) is started targetting the handler ins - During that unwind only the
LEAVE return 5frame is unwound. - The handler in
s()is reached.s()returns and hands the value5tos2(). - Execution continues in
s2(). - The returned value
5is sunk. return 1is called. Same process as above.1issayed
I think the behavior is consistent in that once I understood that &returns are effectively exceptions that search handlers in the outer chain and then unwind the caller chain, the resulting behavior is very comprehensible and expected. Departure from the described behavior will mean adding additional logic to the behavior of stack unwinding. So there would be more logic to know about to fully grasp what's happening.
@vrurg I believe the above behavior is interesting, because your requirement
Return from a block applies to the closure where the block belongs to.
is upheld. (I strongly support this requirement.) But the return 5 in s still manages to interrupt the unwind of return 7 with lasting effect.
Just to state a possibility of how we could adapt the above logic:
We detect the situation where a return (return 5 in the above code) interrupts the unwind of some other return (return 7). Then
if both exceptions are CONTROL {
if target handlers of both exceptions are equal { # This means the returns are in the same lexical scope
... do interrupt the first unwind and proceed with the new return value
}
else {
... let the first unwind continue, ignore the second `return`
}
}
This would have the consequence that the results would be:
- 5
- 7
- 5
- 7
I'm undecided if I like this or not. It limits action at a distance: The return 5 in s doesn't change the effect of return 7 in s2. But it also adds action at a distance: The return 7 in s2 does change the behavior of the return 5 in s.
All of this explanation that is needed to explain behaviour, is convincing me that a return should NOT be allowed in a phaser. Sadly, there is no way to reliably enforce that at compile time.
But we could at run-time by codegenning a CONTROL block in any phaser block. Conceptually:
LEAVE {
say "bye";
return 42;
}
would be codegenned as:
LEAVE {
CONTROL {
die "return within a LEAVE phaser" if $_ ~~ CX::Return
}
say "bye";
return 42;
}
Of course, if the return were to be directly in the phaser's body, we could make that a compile time error as well.
Since phasers cannot be inlined anyway, I don't think this will affect performance of phasers much.
To actually not break any code out there that is depending in old "return in phaser" behaviour, we could limit this CONTROL codegenning to 6.e.
I'm undecided if I like this or not. It limits action at a distance: The
return 5insdoesn't change the effect ofreturn 7ins2. But it also adds action at a distance: Thereturn 7ins2does change the behavior of thereturn 5ins.
What would it change in return 5 behavior? The returned value is not going to be used by definition. So, we can sink it immediately and for the code in s it will go unnoticed.
The noticeable change is that immediate termination of return procedures will prevent a potential CONTROL block in s from being invoked. Perhaps, this can be easy to fix by adding a special if branch which would try the path of seeking for the CONTROL block in the lexical scope without initiating a new unwind.
So, I currently see no cardinal changes that might break s semantics or user expectations about it.
All of this explanation that is needed to explain behaviour, is convincing me that a
returnshould NOT be allowed in a phaser.
While not being fully against this prohibition, I'd not appreciate it. Besides, consider that CATCH and CONTROL are phasers too. Returning a value from CATCH is often an important way to handle an error.
Sadly, there is no way to reliably enforce that at compile time.
Again, hope this won't be needed. But technically, an explicit return can be detected as soon as:
- the current lexical context belongs to a phaser (the compiler knows it)
- the current lexical scope doesn't belong to a routine (pointy block included)
Both being True means this return is going to cause a trouble.
But we could at run-time by codegenning a
CONTROLblock in any phaser block.
Even if we need it, I wouldn't be happy about extra implicit code injection. A VM can be more efficient in implementing this. And more universal.
But either way, we haven't considered all possibilities of optimizing the case. The fact that any return within frames, affected by unwind, is not going to be used makes a lot of things easier, as I see it. All we need is to make sure all related CONTROL blocks are fired up.
Adding case 5.
sub s(&code) {
return 0;
LEAVE {
code()
}
}
sub s2() {
s({
return 1;
});
return 2;
}
say s2();
This is in some sense the inverse of case 2. (I don't see how we can distinguish the two cases, i.e. to make case 2. give 7 and case 5. give 1.)
Current Rakudo (with the two PRs applied) prints 1.
Adding case 5.
It makes no difference with 2. return 1 wins, so it makes 1 eventually.
More interesting the case of s itself. Say, &code is not a block but a routine returning a value. Let it be the 1 we already return. Then we have two options:
- no explicit
returnmakes theLEAVEblock mute and only the side effects of it are what's important - since
LEAVEis the last statement and is executed after thereturn 0its return value is the return value ofs
But the second option loses for two reasons. First, the return 0 statement is explicit. Second, it would make it harder to learn the rules governing returns from routines.
@vrurg I try to explain some more why I think case 2. vs case 5. is difficult.
case 2.
sub s(&code) { # target of second return
LEAVE return 5; # second return -> collision
code();
}
sub s2() { # target of first return
s({ return 7 }); # first return
return 1;
}
s2()
case 5.
sub s(&code) { # target of first return
return 0; # first return
LEAVE { code() }
}
sub s2() { # target of second return
s({ return 1 }); # second return -> collision
return 2;
}
s2()
Let's consider the point during execution marked with "collision".
In case 2 that's return 5. At that point some other return is already in progress (return 7). That other return targets a different routine (s2()).
In case 5 that's return 1. At that point some other return is already in progress (return 0). That other return targets a different routine (s()).
When working with the information given in the previous two paragraphs, the outcome is either 5 and 1 or 7 and 0. I don't see what additional information could be used to help.
Hope, we're not discussing different things here. Also hope that you're not looking at the problem from the perspective of implementation.
Separate the concerns. Separate the sources of s and s2. Best if you separate them into modules from different developers. Now, from the point of view of s2, should it care about what happens inside s? It clearly should not. Because in either case 2 or case 5, the block submitted to s returns from s2. This is what I would see as a developer of s2.
Think of it from the day-to-day development. The author of s must remember to document that it's sub is using a LEAVE phaser which is using return. I matters not if one or the other is not used, but when together – we need to document it.
The author of s2 must remember, that s has a phaser which is using a return. No other routine from this module doesn't, but this one does. Ah, and that the return in LEAVE only fires if this, and this, and, perhaps, that condition are all met. And then there is a couple of other modules, and a routine in one of them has similar behavior. But which one??
That's why I say 2 and 5 are the same. Because all what matters here is s2 as the consumer.
Also hope that you're not looking at the problem from the perspective of implementation.
Maybe "inspired by"... Can't prevent that.
Separate the concerns. Separate the sources of
sands2. Best if you separate them into modules from different developers. Now, from the point of view ofs2, should it care about what happens insides? It clearly should not. Because in either case 2 or case 5, the block submitted tosreturns froms2. This is what I would see as a developer ofs2.
I see what angle you are coming from and I agree.
I just have difficulty mapping what that means technically. After some thinking: Always let the return win, that wants to unwind further down the stack. Could that be it?
I just have difficulty mapping what that means technically. After some thinking: Always let the return win, that wants to unwind further down the stack. Could that be it?
I would tentatively agree. Not sure if we miss no nuance here.
I have added an implementation of the "deeper unwind wins" approach in https://github.com/MoarVM/MoarVM/pull/1786. Spec tests pass including those in https://github.com/Raku/roast/pull/851
I just have difficulty mapping what that means technically. After some thinking: Always let the return win, that wants to unwind further down the stack. Could that be it? Not sure if we miss no nuance here.
We do. Always picking the return that unwinds deeper basically disables &return everywhere in respective phasers. That messes up all but the simplest code. It's a miracle that PR managed to get the spec tests to pass.
Another idea:
- Before calling any exit handler during unwind, we mark the target frame as
this frame is returningand save the return value in the target frame (inframe->extra->exit_handler_result). We could also do that right away on every&return, but that would make all&returns pay the performance penalty. - On every implicit return (that's all returns, not just
&return), check the caller frame for thereturningmarker. If found, then unwind the current frame, restore the previously saved value and do another unwind. That's the equivalent of two implicit returns in a row (+ restoring the right return value).
I think the semantics are correct and the approach to have frames in a "returning" state feels like it matches the core of the issue.
I see issues though:
- We add one if to every (implicit and explicit) return. This is not good for performance. Probably negligible, but I test this yet.
- From an opcode perspective, is this behavior too much magic for
throwpayloadlexcallerand thereturn_*? - We make return fundamentally more complex. Returning is involved in inlining and specialization. This makes all of that more complex. I think this point is the most problematic one.
Yet another idea:
We tune the target frames return_address to point at the return handler. To retrieve the correct value, we'll change lastexpayload to check for the frame->returning flag and if set, retrieve the value saved in frame->extra->exit_handler_result instead of tc->last_payload.
I have implemented the last idea in https://github.com/MoarVM/MoarVM/pull/1788, it turned out rather nice. I have also added another test to https://github.com/Raku/roast/pull/851. It passes all those tests. I haven't done a spec test (need to add a JIT implementation to https://github.com/MoarVM/MoarVM/pull/1788 first), but this looks very promising.
JIT is in. So we now have a working implementation. How do we proceed? I.e. how do we reach a conclusion?
I see two possible options:
- The new semantics (summarized as: the ~~first return~~ last return in a given routine always wins) are accepted.
- We keep the current semantics (return exceptions are not prioritized, last called wins).
I tend to not like:
- Prohibit
&returnin phasers. (Even though I was the one to originally propose it.)
Ping @vrurg, @lizmat.
To summarize: there's an implementation now that implements: "first return wins", and it's spectest clean?
To summarize: there's an implementation now that implements: "first
returnwins", and it's spectest clean?
~~Yes.~~ <-- That was wrong. It's last return in a routine that wins.
The PR is still marked as draft. Maybe it shouldn't? :-)
I can change that. But: We should decide if those semantics are what we actually want. From the above discussion I don't have the impression that we have reached a consensus yet.
So, can we first decide that we want those semantics, then go forward with the implementation. Right?
On February 25, 2024 3:13:45 PM GMT+01:00, Elizabeth Mattijsen @.***> wrote:
The PR is still marked as draft. Maybe it shouldn't? :-)
-- Reply to this email directly or view it on GitHub: https://github.com/Raku/problem-solving/issues/417#issuecomment-1962954284 You are receiving this because you authored the thread.
Message ID: @.***>
+1 for first return wins.
We already have a similar behaviour regarding exit:
$ raku -e 'END exit 42; exit 1'
$ echo $?
1
I know I'm late to the party, but anyway.
S06 says "The C<return> function notionally throws a control exception that is caught by the current lexically enclosing C<Routine> to force a return through the control logic code of any intermediate block constructs. (That is, it must unwind the stack of dynamic scopes to the proper lexical scope belonging to this routine.)"
It says this about ENTER/LEAVE/KEEP/UNDO/etc. phasers: "These phasers supply code that is to be conditionally executed before or after the subroutine's C<do> block (only if used at the outermost level within the subroutine; technically, these are added to the block traits on the C<do> block, not the subroutine object). These phasers are generally used only for their side effects, since most return values will be ignored. (Phasers that run before normal execution may be used for their values, however.)"
The first quoted paragraph makes me inclined to suggest "lastest wins" semantics for return. After all return throws a control exception and with exceptions in general it's that any exception occurring after the original exception (e.g. in the exception handler) will become the dominant one that get's passed up to handlers. This matches what I would expect.
But then S06 is quite explicit about phasers, especially LEAVE being about their side effects. So now I'm a bit torn.
being about their side effects
Could also be interpreted as "don't allow return in a phaser" ?