tree-sitter-latex icon indicating copy to clipboard operation
tree-sitter-latex copied to clipboard

Improve generic command

Open jdujava opened this issue 1 year ago • 7 comments
trafficstars

Fixes #146 Fixes #108

It seems to work pretty well, see included tests.

This however goes against #51, thus breaking #48. This can be annoying, even though

\(]0,+\infty[\)

can be fixed manually by

\(\left]0,+\infty\right[\)

\(]0,+\infty{[}\)

\(]0,+{\infty}[\)

Do you have any idea for a workaround?

One idea I had, but coudn't figure it out, was to demand no spaces in between, such that

\(]0,+\infty [\)

would be OK.

jdujava avatar Jul 21 '24 16:07 jdujava

As LaTeX is whitespace insensitive, the grammar shouldn't be, either.

clason avatar Jul 21 '24 16:07 clason

That would be ideal, but if there isn't any nice workaround, aren't some tradeoffs necessary? Current parsing of command arguments is rather lacking in my opinion.

jdujava avatar Jul 21 '24 16:07 jdujava

Yes, and I think this would be the better tradeoff, unfortunate at this is -- unpredictable syntax highlighting (depending on whitespace or not -- and remember that you're not always working with code you've written yourself) would be worse.

clason avatar Jul 21 '24 17:07 clason

Hmm, understandable. Is there some way to mitigate problems caused by single brackets in math mode? Crate the grammar which always closes the math mode (even if there is such single bracket, which would suggest command argument) or something like that?

jdujava avatar Jul 21 '24 17:07 jdujava

Probably would need a lot more scanner complexity; I don't see how else to deal with this. (And LaTeX is provably impossible to parse and arguably the worst possible language for LR parsers... So any highlighting is purely best effort and should prioritize the "happy path".)

But @pfoerster is smarter than me and maybe has better ideas ;)

clason avatar Jul 21 '24 17:07 clason

Do you have any idea for a workaround?

One idea would be to specifically allow for unbalanced brackets in generic_command. This does not require changes to the lexer. Something like this should work (passes tree-sitter test):

image

pfoerster avatar Jul 24 '24 15:07 pfoerster

I tried it, but in my limited testing it failed to properly parse something like \(x \in [a,b)\).

For slight modification of your suggestion (see new commit), in the file containing just \(x \in [a,b)\) the ending bracket is magically (I hadn't encountered nothing similar before, can you please explain?) appended in tree-sitter tree

(source_file ; [0, 0] - [1, 0]
  (inline_formula ; [0, 0] - [0, 15]
    "\\(" ; [0, 0] - [0, 2]
    (text ; [0, 2] - [0, 13]
      word: (word) ; [0, 2] - [0, 3]
      word: (generic_command ; [0, 4] - [0, 13]
        command: (command_name) ; [0, 4] - [0, 7]
        arg: (brack_group_generic_arg ; [0, 8] - [0, 13]
          "[" ; [0, 8] - [0, 9]
          (text ; [0, 9] - [0, 10]
            word: (word)) ; [0, 9] - [0, 10]
          "," ; [0, 10] - [0, 11]
          (text ; [0, 11] - [0, 12]
            word: (word)) ; [0, 11] - [0, 12]
          ")" ; [0, 12] - [0, 13]
          "]"))) ; [0, 13] - [0, 13]  <<<<<-------------------- WHAT?
    "\\)")) ; [0, 13] - [0, 15]

but for more complex file it oftentimes produces errors (bracket is interpreted as start of (brack_group_generic_arg), but then it can't find the end and fails on the end of (inline_formula).

jdujava avatar Jul 25 '24 14:07 jdujava