Phalanger
Phalanger copied to clipboard
Bug in PerlRegExpConverter
I have a problem with following regular expression: \G(((int(eger)?|bool(ean)?|float|double|real|string|binary|array|object))\s*)
This regular expression gives me different result in phalagner, I assume that the problem is in converting pattern to .net. Pattern is converted to: \G(?<an0ny_1>((int(?<an0ny_2>eger)?|bool(?<an0ny_3>ean)?|float|double|real|string|binary|array|object))\s*)
I think that after : \G(?<an0ny_1>( group name is missing.
The same problem occurs in this regular expression: /((x)y)/ when I match it against 'xy' I get wrong results: preg_match('/((x)y)/', 'xy', $matches, null); $matches[1] == 'x' should be 'xy' $matches[2] == 'xy' should be 'x'
not tested well yet...
diff -r cb4f50629489 Phalanger/ClassLibrary/RegExpPerl.cs
--- a/Phalanger/ClassLibrary/RegExpPerl.cs Thu Sep 11 15:06:26 2014 +0400
+++ b/Phalanger/ClassLibrary/RegExpPerl.cs Mon Sep 15 23:22:18 2014 +0400
@@ -2265,8 +2265,7 @@
result.Append('>');
continue;
}
- else
- if (i + 2 < perlExpr.Length && perlExpr[i + 2] == ':')
+ if (i + 2 < perlExpr.Length && (perlExpr[i + 2] == ':' || perlExpr[i + 2] == '!' || perlExpr[i + 2] == '='))
{
// Pseudo-group, don't count.
--group_number;
@@ -2284,6 +2283,27 @@
case 1:
if (ch == '?')
inner_state = 2;
+ else if (ch == '(')
+ {
+ ++group_number;
+ if (i + 1 < perlExpr.Length)
+ {
+ if (perlExpr[i + 1] != '?')
+ {
+ ++i;
+ result.Append("(?<");
+ result.Append(AnonymousGroupPrefix);
+ result.Append(group_number);
+ result.Append('>');
+ continue;
+ }
+ if (i + 2 < perlExpr.Length && (perlExpr[i + 2] == ':' || perlExpr[i + 2] == '!' || perlExpr[i + 2] == '='))
+ {
+ // Pseudo-group, don't count.
+ --group_number;
+ }
+ }
+ }
else if (ch != '(')// stay in inner_state == 1, because this can happen: ((?<blah>...))
inner_state = 0;
break;
Sorry my mistake I didn't notice that Github changed the first regular expression, the correct one is :
\G(\((int(eger)?|bool(ean)?|float|double|real|string|binary|array|object)\)\s*)
And is converted to:
\G(?<an0ny_1>\((int(?<an0ny_2>eger)?|bool(?<an0ny_3>ean)?|float|double|real|string|binary|array|object)\)\s*)
And after \G(?<an0ny_1>\((
and before int
group name is missing, so it should be like this:
\G(?<an0ny_1>\((?<an0ny_2>int(?<an0ny_3>eger)?|bool(?<an0ny_4>ean)?|float|double|real|string|binary|array|object)\)\s*)
I'm right? Please try to fix also this case, thank you.