clojure-ts-mode
clojure-ts-mode copied to clipboard
Highlight (some) regular expressions using another grammar
I saw the following bit in the emacs-devel archives:
some files may consist of several parts requiring different tree-sitter grammars. For example, a JavaScript file may have its documentation written with jsdoc: JavaScript and jsdoc have a tree-sitter grammar each.
Is there a way to use a tree-sitter grammar in parts of the file and another one in other parts? There could be a main grammar and secondary grammars would be activated on some kinds of nodes of the main one.
Yes, it should be possible, AFAIU. See the node "Multiple Languages" in the ELisp manual, I believe it explains how to do what you want.
As an idea for "somewhere down the line", perhaps it would be interesting to consider the following...
Since tree-sitter-clojure can recognize regex literals, may be one could apply an appropriate regular expression grammar to highlight the portions within the double quotes.
I don't know how close this grammar is to Clojure's flavor of regex, but may be it or some appropriate modification to it (or something that inherits from it) might be used for the task.
For reference, the part of the manual being referred to in the quote above can be see in .texi form here. I didn't manage to find an HTML version. If you've got a recent enough Emacs from the emacs-29 branch, the info may be viewable from within emacs. Worked for me anyway...
Ah sorry. May be I should have made this in the Discussions area?
Ah sorry. May be I should have made this in the Discussions area
No an issue is fine. I don't even get notifications from discussions lol.
This is a good idea. Clojure uses java flavored regular expressions. I'm not sure how much they are different from that grammar. If it is it might be worth forking and calling it tree-sitter-java-regex if the dialects of regex have enough differences.
I don't have the various flavors loaded into my head lately [1].
If I had to guess without looking too closely, I think this is likely to be some JavaScript flavor (or subset of one).
I also don't know / recall whether the various Clojure dialects all support the same regex syntax.
Perhaps this might come in handy eventually.
[1] Mostly working with PEGs in another language ;)
Came across this content among Lapce's files:
((regex_lit) @injection.content
(#set! injection.language "regex"))
@sogaiu check this out 855cddd124eb4ed9197281fe7f56697902b35cb1
Seems useful for other languages as well. Maybe even belongs in emacs core.
Thanks for the heads up!
Hope to take a look soon.
Ok, I gave it a try.
I see about capturing #" and ":
On a side note, may be it's worth requesting that tree-sitter-regex get added to tree-sitter-module?
@rrudakov Perhaps we can apply your learnings from the markdown-inline work here?
@rrudakov Perhaps we can apply your learnings from the
markdown-inlinework here?
I think the biggest issue here is to find a proper grammar. The grammar mentioned in the discussion supports PCRE2, POSIX and JavaScript regexps, I'm not sure that any of those is fully compatible with Java regexps. One difference I can think of is using of double backslashes in Java.
If we find a grammar, adding a new parser and syntax highlighting is pretty straightforward.
I think PCRE2 will work well for our case, as if I recall correctly Java's regular expressions were derived from Perl 5. We'll have to check this, though.
it works pretty well. We need to decide what do we want to highlight and which faces to use for different elements (I'm not a designer and I'm not a regex expert :) ). The possibilities for syntax highlighting are endless (see the syntax tree on the right buffer).
With dark color scheme.
There is also an issue in Emacs. When local parsers are used, offset setting has no effect, so hash sign and quotes are also included into the range (it also applicable to our markdown-inline parser).
It's reported to Emacs bug tracker: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=77848
![]()
With dark color scheme.
This looks good to me. I was going to suggest to focus on match groups, character classes, escapes, anchors and modifiers and I guess that's what you did.
It's reported to Emacs bug tracker: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=77848
The bug is fixed on Emacs master. On Emacs 30 the offset feature doesn't exist, which means that ranges for embedded parsers (markdown-inline and regex) will include quotes and hash character (for regex literal).