RegexAnalyzer
RegexAnalyzer copied to clipboard
Support `u` flag for JavaScript
The code:
console.warn(JSON.stringify(Regex.Analyzer(/\u{20000}/u).tree(), null, 2))
throws an error as the \u{XXXXX} is not supported when the u flag is used.
Update to 1.2.0 (js only) takes some care of this issue, but I am not sure if something else is needed. Take a look. I leave this open.
/\u{2}/u seems to throw an error.
Something like /\p{Punctuation}/u need to be implemented.
Value of char for /\u{20000}/u is not correct. It should be a UTF-16 surrogate pair \uD840\uDC00, which can be get from String.fromCodePoint(0x20000).
Browsers that supports the unicode flag seems to support String.fromCodePoint. A polyfill may be required if this library is intended to work on a JavaScript engine that doesn't support it.
Regex.Analyzer(/\u{20000}/u).compile() should be /\u{20000}/u rather than /\u20000/u.
When the unicode flag is not set, anything like /\u{2}/ should be treated as a literal u and a quantifier {2}.
See doc for more syntax details.
new upload of v.1.2.0
/\u{61}/u
{
"type": 1,
"val": [
{
"type": 32,
"val": "u{61}",
"flags": {
"Char": "a",
"Code": "61",
"UnicodePoint": true
},
"typeName": "UnicodeChar"
}
],
"flags": {},
"typeName": "Sequence"
}
/\u{61}/
{
"type": 1,
"val": [
{
"type": 16,
"val": {
"type": 1024,
"val": "u",
"flags": {},
"typeName": "String"
},
"flags": {
"val": "{61}",
"MatchMinimum": "61",
"MatchMaximum": "61",
"min": 61,
"max": 61,
"StartRepeats": 1,
"isGreedy": 1
},
"typeName": "Quantifier"
}
],
"flags": {},
"typeName": "Sequence"
}
When the unicode flag is not set, anything like /\u{2}/ should be treated as a literal u and a quantifier {2}.
Fixed
Regex.Analyzer(/\u{20000}/u).compile() should be /\u{20000}/u rather than /\u20000/u.
Fixed
Value of char for /\u{20000}/u is not correct. It should be a UTF-16 surrogate pair \uD840\uDC00, which can be get from String.fromCodePoint(0x20000).
Fixed
Something like /\p{Punctuation}/u need to be implemented.
Only on a major update, not anytime soon
/\u{2}/u seems not correctly treated as a unicode char.
The unicode flag changes a behavior that an incomplete unicode sequence like /\x/u, /\x3/u, /\u/u, or /\u30/u throws.
Also a character group like /[\W-3]/u will be invalid. (See doc for more syntax details.)
Not sure if you are going to implement it.
/\u{2}/u seems not correctly treated as a unicode char.
Fixed
/\u{2}/u
"\\u{2}"
{
"type": 1,
"val": [
{
"type": 32,
"val": "u{2}",
"flags": {
"Char": "\u0002",
"Code": "2",
"UnicodePoint": true
},
"typeName": "UnicodeChar"
}
],
"flags": {},
"typeName": "Sequence"
}
/\u{61}/u
"\\u{61}"
{
"type": 1,
"val": [
{
"type": 32,
"val": "u{61}",
"flags": {
"Char": "a",
"Code": "61",
"UnicodePoint": true
},
"typeName": "UnicodeChar"
}
],
"flags": {},
"typeName": "Sequence"
}
/\u{61}/
"u{61}"
{
"type": 1,
"val": [
{
"type": 16,
"val": {
"type": 1024,
"val": "u",
"flags": {},
"typeName": "String"
},
"flags": {
"val": "{61}",
"MatchMinimum": "61",
"MatchMaximum": "61",
"min": 61,
"max": 61,
"StartRepeats": 1,
"isGreedy": 1
},
"typeName": "Quantifier"
}
],
"flags": {},
"typeName": "Sequence"
}
Something like /\p{Punctuation}/u need to be implemented.
Only on a major update, not anytime soon
Maybe we can implement a quick support that simply creates a corresponding node with the provided value (that is, without checking whether it's really valid)? The syntax can be found in the doc. So that developers can use the library to analyze a regex with such syntax without error.