vscode-tlaplus
vscode-tlaplus copied to clipboard
Integrate TLA+/PlusCal tree-sitter grammar?
The TLA+ tree-sitter grammar now supports PlusCal: https://github.com/tlaplus-community/tree-sitter-tlaplus/
You can see how to consume the grammar from TypeScript here: https://github.com/tlaplus-community/tlaplus-tool-dev-examples/tree/main/tree-sitter/typescript
There are a few ways the grammar could help:
- Much more robust syntax highlighting compared to regex-based methods (however would have to use semantic highlighting API probably)
- Code navigation (find all references, go to definition, etc.)
- Code folding
Basically the grammar gives you access to an incrementally-updated error-tolerant queryable syntax tree. Let me know whether you think this would be a useful addition (or how else it could be used) and I will be happy to help with integration.
@edaena, I believe you're familiar with the relevant parts of the extension. One question to answer is how this integrates with SANY. For example, tree-sitter rejecting a spec shouldn't prevent users from launching TLC if SANY is happy, and vice versa. From experience with TLAPS, we know that two parsers disagree in subtle ways (which is why there is an (incomplete) effort to replace the TLAPS parser with SANY). Related, I'd suggest thinking about integrating with the PlusCal translator, such as users modifying the PlusCal without retranslating before model-checking...
Note that the proposed code navigation already exists.
Related:
- https://github.com/tlaplus/vscode-tlaplus/issues/239
- https://github.com/tlaplus/vscode-tlaplus/issues/238
- https://github.com/tlaplus/vscode-tlaplus/issues/245
- https://github.com/tlaplus/vscode-tlaplus/issues/230
- https://github.com/tlaplus/vscode-tlaplus/issues/166
- https://github.com/tlaplus/vscode-tlaplus/issues/163
- https://github.com/tlaplus/vscode-tlaplus/issues/153
- https://github.com/tlaplus/vscode-tlaplus/issues/104
- https://github.com/tlaplus/vscode-tlaplus/wiki/Caveats#poor-pluscal-c-syntax-support
- Syntax highlighting broken with nested modules such as https://github.com/tlaplus/tlaplus/issues/725#issue-1211492653
Of note, tree-sitter doesn't really reject specs; it will provide as good of a parse as it can get, and end up looking something like (sample taken from different language):
(source_file [0, 0] - [3, 0]
(class_def [0, 0] - [2, 1]
(class [0, 0] - [0, 8])
(class_method_name [1, 3] - [1, 6])
(function_block [1, 6] - [1, 19]
(parameter_list [1, 7] - [1, 18]
(argument [1, 9] - [1, 11]
name: (identifier [1, 9] - [1, 11]))
(ERROR [1, 11] - [1, 17]
(integer [1, 13] - [1, 17]))))))
There is a flag you can check to see whether there are any error nodes in the parse tree, but unless you go hunting for them you won't know they're there.
You can see the real-time parse tree (with errors or no) generated by the TLA+ tree-sitter grammar here: https://tlaplus-community.github.io/tree-sitter-tlaplus/
In the future, code navigation could be augmented by stack graphs which are built on tree-sitter grammars. I'm assuming how it works now is analogous to a string search in the file. Moving to tree-sitter grammar queries would make that a bit more robust, but you wouldn't get cross-file referencing.
To clarify: Are you're saying that the list of use-cases is exhaustive s.t. tree-sitter should only replace regex-based syntax highlighting and improve code navigation/completion? For everything else, however, SANY remains authoritative? This then sounds reasonable, especially, if syntax highlighting and code navigation continue to work in the presence of parse failures. By the way, does the tree-sitter parser implement SANY's lookup sequence for extended, instantiated, nested, and re-defined modules (regardless of where they come from)? If not, a user might end up looking at a different module than what the tools will work with (compare https://github.com/tlaplus/vscode-tlaplus/issues/181 and https://github.com/tlaplus/vscode-tlaplus/issues/218).
In the light of https://github.com/tlaplus/vscode-tlaplus/issues/239, what are the (runtime) dependencies of tree-sitter?
Yes, SANY remains authoritative. Tree-sitter doesn't even have the capability to print error messages or anything, it's purely designed to support language tooling. I don't think my list of use-cases is exhaustive really, there are other possibilities like code folding or code formatting/linting that other people have implemented.
Tree-sitter does not implement any semantic analysis or restrictions whatsoever, it is purely a syntax-level tool against which you can write queries. The lookup logic (outside of a single file) could eventually be implemented by encoding it as stack graph rules, although stack graphs still don't have good developer documentation and are only currently used for code navigation on a handful of languages on github itself.
Tree-sitter grammars are all C and C++ files. The generated grammar is a 32 MB C file (src/parser.c in the repo) and the handwritten context-sensitive parsing logic for junction lists and proofs is in src/scanner.cc. When compiled this is about 3 MB in size, and it links to the tree-sitter library which is a few hundred KB in size. This can all easily be compiled with emscripten into WASM (this is what's running in the web demo I linked up above, see https://github.com/tlaplus-community/tlaplus-community.github.io/tree/main/tree-sitter-tlaplus/js). The WASM is also available as a pre-built file in the NPM module.
Oh hey neat, that must be new! People have been asking for native vs code support for tree-sitter for a while now. Guess this is the compromise.