RegexAnalyzer Support `u` flag for JavaScript

The code:

console.warn(JSON.stringify(Regex.Analyzer(/\u{20000}/u).tree(), null, 2))

throws an error as the \u{XXXXX} is not supported when the u flag is used.

Nov 19 '22 08:11 danny0838

Update to 1.2.0 (js only) takes some care of this issue, but I am not sure if something else is needed. Take a look. I leave this open.

Nov 19 '22 13:11 foo123

/\u{2}/u seems to throw an error.

Nov 19 '22 14:11 danny0838

Something like /\p{Punctuation}/u need to be implemented.

Nov 19 '22 14:11 danny0838

Value of char for /\u{20000}/u is not correct. It should be a UTF-16 surrogate pair \uD840\uDC00, which can be get from String.fromCodePoint(0x20000).

Browsers that supports the unicode flag seems to support String.fromCodePoint. A polyfill may be required if this library is intended to work on a JavaScript engine that doesn't support it.

Nov 19 '22 15:11 danny0838

Regex.Analyzer(/\u{20000}/u).compile() should be /\u{20000}/u rather than /\u20000/u.

Nov 19 '22 15:11 danny0838

When the unicode flag is not set, anything like /\u{2}/ should be treated as a literal u and a quantifier {2}.

See doc for more syntax details.

Nov 19 '22 17:11 danny0838

new upload of v.1.2.0

/\u{61}/u
{
  "type": 1,
  "val": [
    {
      "type": 32,
      "val": "u{61}",
      "flags": {
        "Char": "a",
        "Code": "61",
        "UnicodePoint": true
      },
      "typeName": "UnicodeChar"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}
/\u{61}/
{
  "type": 1,
  "val": [
    {
      "type": 16,
      "val": {
        "type": 1024,
        "val": "u",
        "flags": {},
        "typeName": "String"
      },
      "flags": {
        "val": "{61}",
        "MatchMinimum": "61",
        "MatchMaximum": "61",
        "min": 61,
        "max": 61,
        "StartRepeats": 1,
        "isGreedy": 1
      },
      "typeName": "Quantifier"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}

When the unicode flag is not set, anything like /\u{2}/ should be treated as a literal u and a quantifier {2}.

Fixed

Regex.Analyzer(/\u{20000}/u).compile() should be /\u{20000}/u rather than /\u20000/u.

Fixed

Value of char for /\u{20000}/u is not correct. It should be a UTF-16 surrogate pair \uD840\uDC00, which can be get from String.fromCodePoint(0x20000).

Fixed

Something like /\p{Punctuation}/u need to be implemented.

Only on a major update, not anytime soon

Nov 19 '22 18:11 foo123

/\u{2}/u seems not correctly treated as a unicode char.

Nov 19 '22 18:11 danny0838

The unicode flag changes a behavior that an incomplete unicode sequence like /\x/u, /\x3/u, /\u/u, or /\u30/u throws.

Also a character group like /[\W-3]/u will be invalid. (See doc for more syntax details.)

Not sure if you are going to implement it.

Nov 19 '22 18:11 danny0838

/\u{2}/u seems not correctly treated as a unicode char.

Fixed

/\u{2}/u
"\\u{2}"
{
  "type": 1,
  "val": [
    {
      "type": 32,
      "val": "u{2}",
      "flags": {
        "Char": "\u0002",
        "Code": "2",
        "UnicodePoint": true
      },
      "typeName": "UnicodeChar"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}
/\u{61}/u
"\\u{61}"
{
  "type": 1,
  "val": [
    {
      "type": 32,
      "val": "u{61}",
      "flags": {
        "Char": "a",
        "Code": "61",
        "UnicodePoint": true
      },
      "typeName": "UnicodeChar"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}
/\u{61}/
"u{61}"
{
  "type": 1,
  "val": [
    {
      "type": 16,
      "val": {
        "type": 1024,
        "val": "u",
        "flags": {},
        "typeName": "String"
      },
      "flags": {
        "val": "{61}",
        "MatchMinimum": "61",
        "MatchMaximum": "61",
        "min": 61,
        "max": 61,
        "StartRepeats": 1,
        "isGreedy": 1
      },
      "typeName": "Quantifier"
    }
  ],
  "flags": {},
  "typeName": "Sequence"
}

Nov 19 '22 19:11 foo123

Something like /\p{Punctuation}/u need to be implemented.

Only on a major update, not anytime soon

Maybe we can implement a quick support that simply creates a corresponding node with the provided value (that is, without checking whether it's really valid)? The syntax can be found in the doc. So that developers can use the library to analyze a regex with such syntax without error.

Nov 19 '22 19:11 danny0838

RegexAnalyzer RegexAnalyzer copied to clipboard

Support `u` flag for JavaScript

RegexAnalyzer
RegexAnalyzer copied to clipboard