perl5
perl5 copied to clipboard
Evaluation order of `EXPR->(...)` is confusing
While Perl doesn't strictly guarantee evaluation order, it usually does pretty well at not surprising you, doing the right thing in most circumstances. In particular, when fetching elements out of array or hash references, or invoking methods:
eval: (do { say "First"; \my @a })->[say "Second"]
First
Second
eval: (do { say "First"; \my %h })->{say "Second"}
First
Second
eval: (do { say "First"; undef })->meth(say "Second")
First
Second
ERR: Can't call method "meth" on an undefined value at (eval 15) line 1.
(Ignore the fact that it died in this last example; the fact remains it evaluated the LHS and printed "First" before it evaluated the RHS and printed "Second")
This neatness does not hold when invoking code references:
eval: (do { say "First"; sub {} })->(say "Second")
Second
First
The casual observer would expect that it would print "First" then "Second" in that order, as it does for the other examples; but yet it does not.
This may be regarded as "just a quirk", but it does have an annoying consequence when we consider https://github.com/Perl/RFCs/pull/23#issuecomment-1194661578. A satisfactory solution in the optional-chaining case might involve changing the way that regular (i.e. non-optional) code ref invocation also works, to fix this evaluation order problem.
I suspect fixing this would be relatively simple, involving a flag on OP_ENTERSUB to tell it to look at the other end of the stack for the CV to invoke, and shuffling the way the optree is generated in these cases.
However, having done that it remains to be seen whether there's any breakage in core or CPAN, from code that somehow relies on this odd order of evaluation. There probably isn't much of interest, but I wouldn't be surprised if some weird cornercase somewhere somehow makes use of it.
It may be important to realize that the same behaviour happens on regular subroutine calls too, not just refs:
▶ perl -Mv5.36 -E ' my $x; $x->(die q(oops!)) '
oops! at -e line 1.
▶ perl -Mv5.36 -E ' some_function(die q(oops!)) '
oops! at -e line 1.
Both of them should have, in my opinion, failed with "Can't use an undefined value as a subroutine reference" and "Undefined subroutine &main::some_funcion called", respectively, without having called arguments at all. Which is exactly what would have happened if no side-effect argument had been passed.
Also worth noticing: Python, JavaScript and PHP all check if the function can be called before evaluating arguments, while Ruby behaves like Perl for regular functions, calling arguments first and only then seeing if the function can be called, and like the rest for methods (which I think is weird, but it let them add safe navigation operators to their method calls).
I have looked around perlref and perlsub but could not find anywhere defining this behavior. perlsyn says that "when (undef is) used as a reference that isn't being assigned to, it is treated as an error" but even that is only half-true, since it doesn't happen when a variable container is undef - and it makes no mention as to whether arguments are resolved before or after this check.
Still, since the code would die of "cannot call this" anyway even if the arguments resolved properly, any code relying on it would have to already have the call wrapped in an eval/try block, so even if we find some code relying on this (and we sure will), it's going to be self contained and likely easy to refactor.
I find it hard to believe anyone would expect calling/evaluating function arguments before checking whether the function can be called at all, quite the opposite in fact. So I strongly believe this undefined behaviour should be treated as a bug, fixed, and documented.
I find it hard to believe anyone would expect calling/evaluating function arguments before checking whether the function can be called at all, quite the opposite in fact. So I strongly believe this undefined behaviour should be treated as a bug, fixed, and documented.
Just a nitpick: something being "undefined behaviour" means a construct is invalid and has no meaningful semantics. That is not the case here, the order of operations between resolving the function evaluating its arguments is merely unspecified.
On Mon, 25 Jul 2022 at 23:55, Paul Evans @.***> wrote:
While Perl doesn't strictly guarantee evaluation order, it usually does pretty well at not surprising you, doing the right thing in most circumstances. In particular, when fetching elements out of array or hash references, or invoking methods:
eval: (do { say "First"; \my @a })->[say "Second"] First Second
eval: (do { say "First"; \my %h })->{say "Second"} First Second
eval: (do { say "First"; undef })->meth(say "Second") First Second ERR: Can't call method "meth" on an undefined value at (eval 15) line 1.
(Ignore the fact that it died in this last example; the fact remains it evaluated the LHS and printed "First" before it evaluated the RHS and printed "Second")
This neatness does not hold when invoking code references:
eval: (do { say "First"; sub {} })->(say "Second") Second First
The casual observer would expect that it would print "First" then "Second" in that order, as it does for the other examples; but yet it does not.
This may be regarded as "just a quirk", but it does have an annoying consequence when we consider Perl/RFCs#23 (comment) https://github.com/Perl/RFCs/pull/23#issuecomment-1194661578. A satisfactory solution in the optional-chaining case might involve changing the way that regular (i.e. non-optional) code ref invocation also works, to fix this evaluation order problem.
Just for the record, IMO the inconsistency between the different types of invocation should be seen as a bug. We may not specify which order, but I think that should not be taken as a good reason for them to be inconsistent with each other.
yves
-- perl -Mre=debug -e "/just|another|perl|hacker/"
It may be important to realize that the same behaviour happens on regular subroutine calls too, not just refs:
▶ perl -Mv5.36 -E ' my $x; $x->(die q(oops!)) ' oops! at -e line 1. ▶ perl -Mv5.36 -E ' some_function(die q(oops!)) ' oops! at -e line 1.Both of them should have, in my opinion, failed with "Can't use an undefined value as a subroutine reference" and "Undefined subroutine &main::some_funcion called", respectively, without having called arguments at all. Which is exactly what would have happened if no side-effect argument had been passed.
To be pedantic, a side effect of evaluating the argument could be setting the reference, respectively defining the function. However, I can't think of a circumstance where that makes sense, and the fix is easy, just evaluate the argument manually before calling the function.
I find it hard to believe anyone would expect calling/evaluating function arguments before checking whether the function can be called at all, quite the opposite in fact. So I strongly believe this undefined behaviour should be treated as a bug, fixed, and documented.
Agree.
M4
I suspect fixing this would be relatively simple, involving a flag on
OP_ENTERSUBto tell it to look at the other end of the stack for the CV to invoke, and shuffling the way the optree is generated in these cases.
This would be relatively simple if there were any spare flags. But there aren't. All 8 bits (plus the cheeky 9th OPf_SPECIAL flag) are already defined in use for something or other. Though not all of them are easy to track down why they're being used.
Bits used by pp_entersub itself are definitely out:
- Private bit 0:
OPpENTERSUB_INARGS; (badly named; should be_ARGS_HAS_LVAL) remarks that the arguments to the function call include lvalues. - Private bit 1:
HINT_STRICT_REFS; set bystrict refsbeing in scope - Private bit 6:
OPpENTERSUB_DB; set if debugger active - Private bit 7:
OPpLVAL_INTRO; somehow related to lvalues as well but I can't quite work out what
Bits used by XSUBs:
- Private bit 2:
OPpENTERSUB_HASTARG; remarks thatPL_op->op_targhas been allocated, sodXSTARGcan use it.
Bits used elsewhere in core:
- Private bit 3:
OPpENTERSUB_AMPER; set by the tokenizer to remark that the function was called using&foonotation; suppresses some kinds of early-bound static lookups to CVs. - Private bits 4 and 5:
OPpDEREF; set byPerl_doref()to indicate that the function call is in a context where it needs to autovivify an AV/HV/SV ref; later read bypp_leavesuboff the context stack where they'd been put bycx_pushsub[1]
Bits unused by perl itself but by surrounding infrastructure:
- OPf_SPECIAL: Used by
B::Deparseto notate adoin front of the call
I've even considered if any of the other flags are relevant, but none of them are. None of WANT, KIDS, MOD, STACKED, REF nor PARENS can be repurposed for us.
I have even considered if we want to add a new OP type, either for the purpose of adjusting the stack shape here, or moving one of the other flags into so we can free up a bit for our purposes. That won't be easy because there's so much other code around that tests if o->op_type == OP_ENTERSUB, which would all need changing to look at either of two optypes now.
In summary: There's no easy way out of this. About the only thing that comes to mind is trying to find some other way to signal to B::Deparse about the leading do notation, so we can get OPf_SPECIAL back to use for this case.
Thoughts continue ...
[1]: This one was really subtle and hard to find, appearing by rude action-at-a-distance inspection of PL_op directly inside Perl_cx_pushsub and not being passed in as a parameter. I might fix that sometime.
This would be relatively simple if there were any spare flags. But there aren't. All 8 bits (plus the cheeky 9th
OPf_SPECIALflag) are already defined in use for something or other. Though not all of them are easy to track down why they're being used.
question: even if we can somehow pull it off for this particular issue, aren't we just postponing having to deal with this (again) later on? Are the any long(er)-term refactor/solutions we can explore so our future selves don't bump into this wall again?
To be pedantic, a side effect of evaluating the argument could be setting the reference, respectively defining the function.
▶ perl -E 'sub foo { *{q(main::bar)} = sub { say q(it works) } } bar( foo() )'
it works
Talk about building the plane while flying It 😅
Bits unused by
perlitself but by surrounding infrastructure:
- OPf_SPECIAL: Used by
B::Deparseto notate adoin front of the callIn summary: There's no easy way out of this. About the only thing that comes to mind is trying to find some other way to signal to
B::Deparseabout the leadingdonotation, so we can getOPf_SPECIALback to use for this case.
do SUBROUTINE(LIST) was removed in 5.20.0 (comit 8c74b41425572faeb638f1269025b59d0785794f) why does B::Deparse still care?
do SUBROUTINE(LIST)was removed in 5.20.0 (comit 8c74b41) why doesB::Deparsestill care?
Ahah! Maybe it doesn't, perhaps it's an old leftover from that. I'll try just removing that code and see what breaks. Maybe we can reüse the OPf_SPECIAL bit after all..
Another possible approach may be that we can reclaim that HINT_STRICT_REFS bit. It's only used in a relatively slow UNLIKELY() path within pp_entersub anyway and so far on testing with a little assert, it seems always identical to the value you'd look up from the more conventional CopHINTS(PL_curcop) & HINT_STRICT_REFS anyway. Maybe we don't need to bake that value into the op_private during the checker, when it can be looked up out of PL_curcop as normal at runtime. That would also reclaim a bit.
why does B::Deparse still care?
Because I failed to remove the relevant block in the above commit. See #20001.
OK. So... Thanks to @ilmari tidying up the use of OPf_SPECIAL on OP_ENTERSUB, I now have a branch that attempts to fix this. It does so by defining that the flag now tells pp_entersub to expect to find the CV as the first value on the stack, rather than than the final one. A few bits of code around opcheckers need fixing up but overall it seems to work OK.
The one downside to it is that because of the different shape of the stack at runtime, pp_entersub has to move all the real arguments down on the stack (an O(n) operation) to cover over that reshaping. It makes it a little slow to run.
Alternative thoughts might be to forget using that flag, and have EXPR->(...) use a padtmp instead; effectively becoming do { my $tmp = EXPR; $tmp->(...) } in cases where EXPR isn't already just a PADSV. This would allocate an extra padtmp, but only in cases where LHS is not a simple padsv expression already (which honestly in most cases it probably is). It would also have no change on other code elsewhere, so may be overall simpler. Hmm.
This would allocate an extra padtmp, but only in cases where LHS is not a simple padsv expression already (which honestly in most cases it probably is).
In case it is of use, a simple grep for /\w->\(/ and /\W->\(/ in my main maths codebase (https://github.com/hvds/seq) finds 74 of 84 occurrences are of a simple variable (which will typically always be lexicals in my code); the others are picking the subref out of a structure, like:
$self->{sublist} = [ $self->{list}[0][1]->() ];
next if $tester->[$_]->($cur);
$def{$which} = $def{$which}->($args);
my $start = $machine->{start}->($m);
my($u, $v) = $s[$y]->($i, $j);
while (!($swap = $perm_stack[$index]->())) {