rakudo icon indicating copy to clipboard operation
rakudo copied to clipboard

Caller-$/-setting is too invasive

Open zoffixznet opened this issue 8 years ago • 18 comments

This started with RT#126721 that points out .subst's replacement block has wrong $/ if caller's $/ is read-only. The bug was originally fixed by having .subst throw in such circumstances, just like .match and .trans do (now reverted).

This fix caused breakage in about 20 modules and when I went to fix it I ended up staring at diffs like this, with a dozen changes, all just to rename a variable, all just because a .subst call was used.

This is what the users would regularly have to do in their grammar actions, since $/ is the common name for the parameter. For that reason, I propose we go into the opposite direction and instead don't set caller's $/ in method forms. Off the top of my head, that'd affect .parse, .subparse, .match, .trans (currently throw on read-only $/) and .subst and .subst-mutate (currently silently fail). .comb with regex might be also in the list, except it already appears it does not touch $/.

The operator forms (/.../, s/...//, S/...//, and tr///) on the other hand would set $/. Unlike operators, methods are often chained, and at least to me feel like they shouldn't be messing with variables in my current scope.

So "foo" ~~ /(f)(.)(.)/ would set $/ and the user can obtain captures from it. "foo".match: /(f)(.)(.)/ would not and it'd be assumed the user would be chaining it with something and using the return value.

One obvious implementation would be to add something like :$RAKUDO-INTERNAL-SET-DOLLAR-SLASH parameter to the methods that operators would set, since they're implemented in terms of those methods. The arg could also be made public API (e.g. :$set-match), so users could set it in method forms to affect the $/ . Finally, the back-compat way to do this could be to add a reverse arg: one that makes method forms NOT touch $/ but IMO that feels like a bandaid over the issue and users would still get confusing errors when using these methods in grammar actions, except instead of telling them to rename the parameter we'd be telling them to set an arg.

This change would break 6.c tests, so it's 6.d material.

zoffixznet avatar Nov 05 '17 16:11 zoffixznet

This is what the users would regularly have to do in their grammar actions, since $/ is the common name for the parameter

I think there's a fairly good argument that using subst inside of an action method would imply re-parsing, which one should think twice about anyway. I agree silent failure is a very bad behavior, though.

On the more general issue, the thing to keep in mind is that the replacement part tends to want to refer to $/. In the case of the s/// syntax, we may well already be compiling the replacement part as taking $/ as a parameter, so it'd work out fine. The thing we'd extensively bust if we changed subst to not set $/ is anything like $foo.subst(/(\d+)/, { $0 * 2 }, :g). I suspect that's extremely common, and would probably break a lot more than adding the error on silent failure to set.

jnthn avatar Nov 05 '17 17:11 jnthn

I think a third possibility that's likely 6.c-friendly is possible: with both ops and methods, try to set caller's $/ and if we fail, then assume the user decided they don't want it set. If it's possible for $/ to be used within the machinery of the method, then the method will put the match into their own $/ that's visible within the machinery but does not affect caller's $/:

  • .subst: /(\d)/, { $0 * 2 } sets caller's $/
  • $/ := "meow"; .subst: /(\d)/, { $0 * 2 } can't set caller, so it doesn't bother to, but $/ is still available within the block

Basically, instead of bugging the user to rename their variables or silently failing, we'd be silently using our own $/ that doesn't get propagated to the caller.

zoffixznet avatar Nov 06 '17 13:11 zoffixznet

That option sounds like the most user-friendly to me.

vendethiel avatar Nov 06 '17 14:11 vendethiel

grammar {
  token TOP { <letters> { if $<letters> ~~ /^a/ { make 42; } } }
  token letters { \w+ }
}.parse("abcd").made.say; # OUTPUT: Nil

please take this case into consideration.

make/made might be the most commonly actions, and, user who write grammar quite probably tends to using regexes. this case shows that as things go complex, using regexes may break what we always do($<token-name>) in grammar-actions.

tisonkun avatar Nov 07 '17 12:11 tisonkun

@zoffixznet

$/ := "meow"; .subst: /(\d)/, { $0 * 2 } can't set caller, so it doesn't bother to, but $/ is still available within the block

Would the $0 inside the block refer to the caller's $/ (i.e. "meow")? If so, I think that's worse than throwing an exception - bugs caused by some code suddenly starting to use a different value than the user expected due to a change elsewhere, are among the most annoying to debug.

Or do you mean that the .subst method would somehow set $/ inside the callback block passed to it, before invoking that block? If that's possible (and doesn't turn out to have negative consequences), I think the best solution would be your original proposal (method forms not setting $/ in their caller's scope), with the addition that the ones that accept a callback do set $/ inside that callback's scope.

smls avatar Jan 29 '18 16:01 smls

Would the $0 inside the block refer to the caller's $/ (i.e. "meow")?

No, the $0 inside the block would contain (\d) match from subst. What you describe is the current, buggy behaviour.

zoffixznet avatar Jan 29 '18 23:01 zoffixznet

@zoffixznet

No, the $0 inside the block would contain (\d) match from subst.

How? Which is to say, what mechanism will pass $/ from .subst to the callback block?

Will .subst "cheat" to accomplish this, or could a user-defined (pure Perl 6) .subst-alternative do the same?

  • Maybe an expansion of the Capture type so it can hold a value for $/ (and $_ and and $!?) in addition to positional and named arguments, which any Block will then set in its scope when it is invoked with the Capture?

  • Or an alternative to .CALL-ME that says "Prepare a call to this block, i.e. resolve any multi-dispatch and bind the parameters and create a callframe, but don't start running its code just yet – instead, give me the callframe so I can mess with it first"?

smls avatar Jun 28 '18 21:06 smls

How?

I don't remember anymore or whether I tried implementing anything at the time.

It's possible that I thought that since we have to do all the fetching of $/ from caller in the method that we'd be able to fully control the $/ inside the block.

Is $/ lexical or dynamic?

zoffixznet avatar Jun 28 '18 21:06 zoffixznet

Is $/ lexical or dynamic?

This says dynamic:

<Zoffix_> m: say $/.VAR.dynamic
<camelia> rakudo-moar 587cd4f9e: OUTPUT: «True␤»

So I'd expect this:

<Zoffix_> m: use MONKEY; augment class Str { method s (\re, \b) { $/ := "foos"; dd b.() } }; $/ := "meows"; "foo".s: /o(o)/, -> { $/ ~ "g" };
<camelia> rakudo-moar 587cd4f9e: OUTPUT: «"meowsg"␤»

To work like this:

<Zoffix_> m: use MONKEY; augment class Str { method s (\re, \b) { my $*z := "foos"; dd b.() } }; my $*z := "meows"; "foo".s: /o(o)/, -> { $*z ~ "g" };
<camelia> rakudo-moar 587cd4f9e: OUTPUT: «"foosg"␤»

Is the optimizer converting the $/ to lexical overeagerly in this context, perhaps?

zoffixznet avatar Jun 28 '18 21:06 zoffixznet

Just another Perl Hacker, Fernando (SmokeMachine)

2018-06-28 18:48 GMT-03:00 Zoffix Znet [email protected]:

Is $/ lexical or dynamic?

This says dynamic:

<Zoffix_> m: say $/.VAR.dynamic rakudo-moar 587cd4f9e: OUTPUT: «True␤»

So, shouldnt it be called $*/? are there another dynamic vars that does not use the *?

So I'd expect this:

<Zoffix_> m: use MONKEY; augment class Str { method s (\re, \b) { $/ := "foos"; dd b.() } }; $/ := "meows"; "foo".s: /o(o)/, -> { $/ ~ "g" }; rakudo-moar 587cd4f9e: OUTPUT: «"meowsg"␤»

To work like this:

<Zoffix_> m: use MONKEY; augment class Str { method s (\re, \b) { my $*z := "foos"; dd b.() } }; my $*z := "meows"; "foo".s: /o(o)/, -> { $*z ~ "g" }; rakudo-moar 587cd4f9e: OUTPUT: «"foosg"␤»

Is the optimizer converting the $/ to lexical overeagerly in this context, perhaps?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rakudo/rakudo/issues/1235#issuecomment-401183774, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGF-i73ArcmEDVmYhORg9dTQClhAPrFks5uBU8kgaJpZM4QSbwA .

FCO avatar Jun 28 '18 22:06 FCO

FWIW, this is still an issue:

$/ := "meow"; "42".subst: /(\d)/, { $0 * 2 }'
# Cannot convert string to number: base-10 number must begin with valid digits or '.' in '⏏meow' (indicated by ⏏)

lizmat avatar Oct 01 '24 10:10 lizmat

I've found that using the callers $/ makes $/-related operators thread-unsafe: say (^1000).race.map({ S/ (.*) /$0/ }).unique.elems # prints something like 856 This can be fixed by declaring $/ in a lexical scope: say (^1000).race.map({ my $/; S/ (.*) /$0/ }).unique.elems # prints 1000

fingolfin0 avatar May 22 '25 18:05 fingolfin0

Could we automatically insert a lexical definition for $/ whenever the compiler encounters a scoped regex?

ab5tract avatar May 28 '25 15:05 ab5tract

@ab5tract: I seem to remember that that is not generally an option, because of some other magic $/ shenanigans

lizmat avatar May 28 '25 15:05 lizmat

@ab5tract: I seem to remember that that is not generally an option, because of some other magic $/ shenanigans

Maybe we could add a new adverb that sacrifices the other magic and shenanigans for the safety of a guaranteed lexical $/ ?

ab5tract avatar May 30 '25 21:05 ab5tract

Where would you apply such an adverb?

lizmat avatar May 30 '25 22:05 lizmat

Where would you apply such an adverb?

Where all other regex adverbs go?

ab5tract avatar Jun 04 '25 13:06 ab5tract

The adverb wouldn't solve the thread safety issue as that stems from users not thinking about $/ at all when running threaded regex code. If they don't think about $/ they won't know that they need to add that adverb.

I think the only way out here is to get rid of the magic $/ shenanigans in the first place. Setting some magic variable in the caller's scope is just fundamentally incompatible with multi threading. We'd need to look at what this mechanism is used for and find other solutions for the original problems that it solves.

niner avatar Jun 04 '25 13:06 niner