RSyntaxTextArea
RSyntaxTextArea copied to clipboard
Add support for highlighting based on the actual AST
Greetings.
I do use and love this library very much and I think its great.
However, it seems to fall short for supporting complex languages like SQL or R where the meaning of a token can't be derived from a keyword only.
Example:
SELECT Overlaps( overlaps ) AS overlaps
FROM overlaps.overlaps overlaps
WHERE overlaps = 'overlaps'
AND (CURRENT_TIME, INTERVAL '1' HOUR) OVERLAPS (CURRENT_TIME, INTERVAL -'1' HOUR)
;
In this example, OVERLAPS is an Expression, a Function, a Label, a Column Name -- and so the highlighting is wrong based on classifying it as keyword.
In my opinion, parsing the AST would help because it will give you precise information, what Token Type comes next.
Would you be willing to accept another TokenMaker implementation, which reads Tokens from an AST (generated by ANTLR or JavaCC)? I would like to start with an implementation for SQL because this bothers me most.
There's this PR that I've beeeen sitting on for far too long:
https://github.com/bobbylight/RSyntaxTextArea/pull/353
I'll freely admit - I haven't looked too deeply at it, and there are some things about it I don't like, but one thing that I really like about it is that the Antlr support is an add-on, not part of the core RSTA artifact, to keep the dependency set of the main library small (antlr seems to have a huge amount of transitive dependencies!).
Maybe it should be a completely separate sister library, like AutoComplete? Not sure about that.
I'm wondering if this work should be thrown into a branch and worked cleaned up there - rebased on top of current master, get the Antlr generation performed on builds, or at least through Gradle (something I also need to do for the JFlex TokenMakers, but those are so hacked together right now...), etc.
Anyway, feel free to take a look at that PR and see what you think. If I find time I'll take a look as well.
Thanks for the feedback @bobbylight. Its great to know that there has been done some work on ANTLR already! I will have a look at it.
Two notes:
-
there is a library, that builds Parser on the fly for any Grammar. No need to pre-built it. Please have a look at: https://github.com/julianthome/inmemantlr
-
ANTLR dependencies are heavy because of ICU. Also it is slow(er) compared to JavaCC built parsers. Although the ANTLR grammars are much cleaner and more powerful. So maybe, 2 TokenMakers were needed. One for ANTLR and one for JavaCC. JavaCC comes without any dependencies.
Just to throw in another voice – I am currently re-working the Java-based editor for a custom language we are using in our project https://github.com/cindyjs/cindyjs, and as we already have an ANTLR-generated Lexer it would be wonderful to re-use this easily… I tried to use the code from https://github.com/tisoft/rsyntaxtextarea-antlr4-extension by @tisoft, but it is not as straightforward as I was hoping for.
Greetings again.
After thinking this through and experimenting a bit, I think of two ways to make it actually work:
- language specific parser implements the interface
TokenMakerand this implementation is loaded into RSTA Example: JSQLParser would implement a TokenMaker based on the visitor pattern and creates tokens when it traverses the AST. SQL is a kind of special because it can mix Keywords and Identifiers and has no strict rules for blocks.
Challenge: TokenMaker should be a kind of separate library, which can be included in every possible parser on demand -- without pulling the full RSTA as a dependency.
Language Server Protocoldefines semantic token/highlighting (which is just a more generic approach toTokenMaker). We could write aLSPTokenMakerbased on LSP4J, which gets the tokens from a LSP Server.
Since I am most interested in SQL and heavily involved in JSQLParser, I would like to invest into option 1) first. What would be a good way to manage the dependencies please? Can we split off TokenMaker into an API, which can be implemented outside of RSTA? There is no way to pull all RSTA into JSQLParser.
Please advise.