tree-sitter-regex
tree-sitter-regex copied to clipboard
fix grammar when character class is only a range
Prior to this change a character class such as [0-1]
was parsed as:
(character_class)
([)
(character_class)
(character_class)
(character_class)
(])
So the range was completely dropped. This happened for two reasons:
- The grammar had a reduction for
class_range
which dispatched directly to_class_atom
, which did more work than was strictly necessary to match a class character. - I'm unsure if this made a major difference, but the order of the conflict resolution array preferred
character_class
overclass_range
.
After fixing the issues above I now get:
(character_class)
([)
(class_range)
(class_character)
(-)
(class_character)
(])
While still accepting expressions such as [-0-1-]
(optional -
on each end).
Note: this approach does generate slightly larger tables due to hoisting the -
optional check to the character_class
production, but it seems like a reasonable tradeoff to get correct behavior.