SubEthaEdit
SubEthaEdit copied to clipboard
New Mode format
While it served us well in the past, creating modes isn't as easy and straight forward as it could be. We should reconsider updating the mode bundle format to improve on some of it's clumsier aspects:
- Remove redundancy (mode names and version e.g. are in multiple files, often annoying to update, one could merge these into less different files where appropriate)
- Switch away from XML (A lot of mode creation is based on regexes, and those need to be escaped quite annoyingly in XML, making the roundtrip on creating, update and maintenance less convenient as it could be. Switching to TOML is one avenue I would like to explore, as it has great support for strings in a way that don't need escaped.
- Switching to flat bundles might be beneficial to remove the deep hierarchy on editing.
Along those lines we should also think about making the mode states more semantical to improve reuse. While importing of modes is a great feature to have full language capabilities in all languages that embed other languages, there would be great potential to have great semantic entities for e.g. strings and number representations that can be mixed and matched to optimally and correctly represent them for the current mode.
If I understand it correctly this is about Language Support? How much overlap is there with #10 ? If a new format is wished for, I would like to recommend to start with looking at the format that TextMate is using, as far as I understand it, Sublime is using that same or a very similar format and lately VS-Code seems to also support it.
A link that at least seams to partially describe the format: https://ilkinulas.github.io/programming/2016/02/05/sublime-text-syntax-highlighting.html
It's probably worth doing a safari around to see how far apart everyones implementation is these days. Note that sublime itself moved on to http://www.sublimetext.com/docs/3/syntax.html
As far how much overlap there is with #10: I don't think much as even with Language Server Protocol support we will want to keep the local mode around as an option at least.
Here is the old TextMate definition VSCode seems to compile something from yaml (maybe what sublime is using?) None of these implementers seem to link to a spec of their formats anywhere. VSCode has this script which you could argue is some sort of specification. :/ Not pretty. But there are at least a lot of .tmlanguage files for various languages around.
One principal problem I remember from digging at these files last, is that for languages like python, stuff like this is not (or not efficiently) parseable:
def thisFunction():
return """
has a string that is not indented
"""
Folding fails for this, because it does not detect that the function does not end on the second line of the string. Well at least that was a problem once.
LSP currently does not support this: Proposal of the semantic highlighting protocol extension #367.
Want to tackle this rather sooner than later. The current mode file format is to annoying to go back and forth because of the XML escaping and a lot of the built in modes get old in the teeth or need more modern replacements, additions. Since no cool community has been found to address this, it might also be prudent to think about basic ingestion of other existing highlighting formats (e.g. from textmate, sublime, et. al) to at least get the ball rolling and expanding again. Having a flexibile highlighter and mode format is a great benefit to support a breadth of files, which SubEthaEdit currently leaves on the table.
I'd also like to actually extract and isolate the highlighting code during that process more so it could be adopted by other parties of interest.
So my first order of business would be to just explore a toml representation for the current highlighters abilites and go from there.
Also this seems to be some specification of the TextMate format: https://macromates.com/manual/en/language_grammars
I would really like to see the ability to use TextMate or VSCode language bundles as that would open up a lot of language definitions that are out there already.