Phalanger icon indicating copy to clipboard operation
Phalanger copied to clipboard

Bug in PerlRegExpConverter

Open broudy3 opened this issue 10 years ago • 2 comments

I have a problem with following regular expression: \G(((int(eger)?|bool(ean)?|float|double|real|string|binary|array|object))\s*)

This regular expression gives me different result in phalagner, I assume that the problem is in converting pattern to .net. Pattern is converted to: \G(?<an0ny_1>((int(?<an0ny_2>eger)?|bool(?<an0ny_3>ean)?|float|double|real|string|binary|array|object))\s*)

I think that after : \G(?<an0ny_1>( group name is missing.

The same problem occurs in this regular expression: /((x)y)/ when I match it against 'xy' I get wrong results: preg_match('/((x)y)/', 'xy', $matches, null); $matches[1] == 'x' should be 'xy' $matches[2] == 'xy' should be 'x'

broudy3 avatar Sep 07 '14 21:09 broudy3

not tested well yet...

diff -r cb4f50629489 Phalanger/ClassLibrary/RegExpPerl.cs
--- a/Phalanger/ClassLibrary/RegExpPerl.cs  Thu Sep 11 15:06:26 2014 +0400
+++ b/Phalanger/ClassLibrary/RegExpPerl.cs  Mon Sep 15 23:22:18 2014 +0400
@@ -2265,8 +2265,7 @@
                                             result.Append('>');
                                             continue;
                                         }
-                                        else
-                                        if (i + 2 < perlExpr.Length && perlExpr[i + 2] == ':')
+                                        if (i + 2 < perlExpr.Length && (perlExpr[i + 2] == ':' || perlExpr[i + 2] == '!' || perlExpr[i + 2] == '='))
                                         {
                                             // Pseudo-group, don't count.
                                             --group_number;
@@ -2284,6 +2283,27 @@
                            case 1:
                                 if (ch == '?')
                                     inner_state = 2;
+                                else if (ch == '(')
+                                {
+                                    ++group_number;
+                                    if (i + 1 < perlExpr.Length)
+                                    {
+                                        if (perlExpr[i + 1] != '?')
+                                        {
+                                            ++i;
+                                            result.Append("(?<");
+                                            result.Append(AnonymousGroupPrefix);
+                                            result.Append(group_number);
+                                            result.Append('>');
+                                            continue;
+                                        }
+                                        if (i + 2 < perlExpr.Length && (perlExpr[i + 2] == ':' || perlExpr[i + 2] == '!' || perlExpr[i + 2] == '='))
+                                        {
+                                            // Pseudo-group, don't count.
+                                            --group_number;
+                                        }
+                                    }
+                                }
                                 else if (ch != '(')// stay in inner_state == 1, because this can happen: ((?<blah>...))
                                     inner_state = 0;
                                 break;

proff avatar Sep 15 '14 19:09 proff

Sorry my mistake I didn't notice that Github changed the first regular expression, the correct one is : \G(\((int(eger)?|bool(ean)?|float|double|real|string|binary|array|object)\)\s*)

And is converted to: \G(?<an0ny_1>\((int(?<an0ny_2>eger)?|bool(?<an0ny_3>ean)?|float|double|real|string|binary|array|object)\)\s*)

And after \G(?<an0ny_1>\(( and before int group name is missing, so it should be like this: \G(?<an0ny_1>\((?<an0ny_2>int(?<an0ny_3>eger)?|bool(?<an0ny_4>ean)?|float|double|real|string|binary|array|object)\)\s*)

I'm right? Please try to fix also this case, thank you.

broudy3 avatar Sep 15 '14 22:09 broudy3