rakudo icon indicating copy to clipboard operation
rakudo copied to clipboard

Unicode property methods/subs are not thread-safe

Open vrurg opened this issue 2 years ago • 5 comments

The following simple line causing me some headaches in a production code:

    unless $amount ~~ s/^<curr-sym=:Sc> { $symbol-at-start = True } || <curr-sym=:Sc>$// {
        X::CurrencyString.new(:$amount).throw
    }

The code is being ran by cron and way too often it emails me back with something like:

Bad amount (currency) string: "\$75.11"

Since the string in the error is produced with .raku for my own convenience, the regex acutally fails on $75.11 or alike. I mean, the actual amount my vary and thus is irrelevant.

The code is rather heavily multi-threaded and that's why concurrency is the primary suspect here. Moreover, the last nail in the coffin of my patience was the following error:

MoarVM oops: insert conflict, titlecase_letter is 477017254, 18 != 0
from SETTING::src/core.c/Str.pm6:1686 (/home/fta/.rakubrew/versions/moar-master/install/share/perl6/runtime/CORE.c.setting.moarvm:match)

Full backtrace points back to the above regex again.

Unfortunately, I cannot produce a golf. Event plain concurrent invocation of the code which is using the regex doesn't result in any errors.

vrurg avatar Apr 15 '22 01:04 vrurg

I think inadvertent sharing of the $/ variable between threads is often an issue with regexes.

niner avatar Apr 15 '22 06:04 niner

@niner everything is lexical. Unless the sharing happens somewhere at the backstage, but unlikely.

vrurg avatar Apr 15 '22 13:04 vrurg

This might me related to https://github.com/rakudo/rakudo/issues/4236 I'm pretty sure that's caused by some thread-safety issues in regexes.

patrickbkr avatar Apr 15 '22 20:04 patrickbkr

This could have something to do with unicode properties, perhaps. While I was trying to find a workaround the following variant was tested and failed too:

if ! $amount.contains(/^ <:Sc>/) {
    ...
}
else {
    ... # throw
}

It did fail on a string starting with '$'.

At the moment I have simplified my code to just unimatch($amount, "Sc") and this variant is passing test runs so far.

vrurg avatar Apr 17 '22 00:04 vrurg

I have changed the subject because it looks like regxes are not guilty. Changing my code to the following didn't fix the problem:

    if unimatch($amount, "Sc") {
        ...
    }
    elsif unimatch($amount.substr(* - 1), "Sc") {
        ...
    }
    else {
        FTA::X::CurrencyString.new(:$amount).throw
    }

So, it is clear this time that unimatch is the culprit. I haven't done any further investigation but even quick scan of unicodey.pm6 has let me to spot the following piece of code:

    my constant $name2pref = nqp::hash(
        ...
    );

    my sub codename2proppref(
      uint $code, str $propname, $prop is rw, $pref is rw
    --> Nil) {
        $prop = nqp::unipropcode($propname);
        $pref = nqp::atpos_s($prop2pref,$prop) || nqp::ifnull(
          nqp::atkey($name2pref,$propname),
          nqp::bindkey(
            $name2pref,
            $propname,
            nqp::if(
              nqp::istrue(my $result := nqp::getuniprop_str($code,$prop)),
              'S',
              'I'
            )
          )
        )
    }

Looks like it must be lock-protected because it is invoked by uniprop which is, in turn, is invoked by unimatch.

vrurg avatar Apr 19 '22 18:04 vrurg