royale-asjs icon indicating copy to clipboard operation
royale-asjs copied to clipboard

RegEx Issues when mapping

Open javeiga-iest opened this issue 4 years ago • 3 comments

Hi Guys, I am using RegEx class to use patterns in my project as i used to do in flex.

I have seen that this class is already used in beads as "Restriction" of the TextInput in Jewel.

I have discovered several pattern transformation problems when declaring a regular expression.

Some context, i have that pattern i use to restrict text to hours and minutes:

/^(\ *)([+-]?)(\ *)(\d+)(:|.|\ )([0-5]{1})([0-9]{1})(\ *)$/gi

I use that page to check my patterns: https://regexr.com/ As you can see, my pattern works fine. image

The problems come when we try to initialize a variable of type RegExp:

First issue var miexc2:RegExp=/^(\ *)([+\-]?)(\ *)(\d+)(\:|\.|\ )([0-5]{1})([0-9]{1})(\ *)$/gi;

The resulting mapping is as follows: /^(\\u0020*)([+\-]?)(\\u0020*)(\d+)(\:|\.|\\u0020)([0-5]{1})([0-9]{1})(\\u0020*)$/gi When converting space to unicode it ignores the previous escape (\) and escapes twice. The result should be: ^(\u0020*)([+\-]?)(\u0020*)(\d+)(\:|\.|\u0020)([0-5]{1})([0-9]{1})(\u0020*)$/gi

Seccond issue

Using the same regular expression, if instead of declaring it directly I do it through the constructor, passing it the pattern as a text string, the result is different.

var miexc3:RegExp=new RegExp("^(\ *)([+\-]?)(\ *)(\d+)(\:|\.|\ )([0-5]{1})([0-9]{1})(\ *)$","gi");

Result:

/^( *)([+-]?)( *)(d+)(:|.| )([0-5]{1})([0-9]{1})( *)$/gi/

What happened here? It seems that here it does respect the space character, however, it has not been able to respect the digit character (\d) removing the backslash character

What was expected: /^( *)([+-]?)( *)(\d+)(:|.| )([0-5]{1})([0-9]{1})( *)$/gi/

javeiga-iest avatar Apr 14 '21 10:04 javeiga-iest

Round-tripping escaped strings correctly through Java is hard.

@aharui @joshtynjala @greg-dove Can one of you take a look and comment?

Harbs avatar Apr 14 '21 10:04 Harbs

I do it through the constructor, passing it the pattern as a text string, the result is different. var miexc3:RegExp=new RegExp("^(\ *)([+-]?)(\ *)(\d+)(:|.|\ )([0-5]{1})([0-9]{1})(\ *)$","gi");

The \ character in a string escapes the next character (for instance \n is new line and \t is tab). To include a backslash in a string that will actually be a backslash, you need to escape it with an extra backslash, like this: \\

With that in mind, I think that this is actually what you want when using the constructor:

var miexc3:RegExp=new RegExp("^(\\ *)([+\\-]?)(\\ *)(\\d+)(\\:|\\.|\\ )([0-5]{1})([0-9]{1})(\\ *)$","gi");

joshtynjala avatar Apr 14 '21 16:04 joshtynjala

Coincidentally, when I posted my comment originally, Github treated my double backslash as an escape sequence and displayed only one of them. I was able to work around that by formatting the double backslash as code.

As Harbs said, round-tripping escaped strings correctly is hard! Even Github got it wrong.

joshtynjala avatar Apr 14 '21 16:04 joshtynjala