tree-sitter-regex icon indicating copy to clipboard operation
tree-sitter-regex copied to clipboard

fix grammar when character class is only a range

Open cdacamar opened this issue 5 months ago • 0 comments

Prior to this change a character class such as [0-1] was parsed as:

(character_class)
  ([)
  (character_class)
  (character_class)
  (character_class)
  (])

So the range was completely dropped. This happened for two reasons:

  1. The grammar had a reduction for class_range which dispatched directly to _class_atom, which did more work than was strictly necessary to match a class character.
  2. I'm unsure if this made a major difference, but the order of the conflict resolution array preferred character_class over class_range.

After fixing the issues above I now get:

(character_class)
  ([)
  (class_range)
    (class_character)
    (-)
    (class_character)
  (])

While still accepting expressions such as [-0-1-] (optional - on each end).

Note: this approach does generate slightly larger tables due to hoisting the - optional check to the character_class production, but it seems like a reasonable tradeoff to get correct behavior.

cdacamar avatar Sep 14 '24 01:09 cdacamar