tree-sitter-nim
tree-sitter-nim copied to clipboard
Use nim compiler to parse strange syntax
Can we use nim complier as library (runtime rather than compiletime) to parse edge case statements? Like in the example below, there is a if expression. In theory, we can create a shared library, in which there are some C-functions like
bool isIfExpr(char* src, size_t src_size);
To run the following example, you need
nimble install compiler
example
import compiler / [ast, idents, parser, options]
import strutils
import strformat
var
identCache = newIdentCache()
configRef = newConfigRef()
code = """
var tmp = if 1 > 2 : "1 is bigger than 2" else : "1 is not bigger than 2"
"""
proc echoTree(tree : PNode, indent_level : int = 0) : string =
if tree != nil:
if tree.kind == nkIfExpr:
echo fmt"""
====================================================
WHAT? WE DETECTED AN IF_EXPRESSION
====================================================
"""
result = result & " ".repeat(4 * indent_level) & tree.kind.`$` & fmt" : ({tree.info.line}:{tree.info.col}) "
case tree.kind
of nkCharLit .. nkUInt64Lit:
result = result & fmt" : {tree.intVal}"&"\n"
of nkFloatLit .. nkFloat128Lit:
result = result & fmt" : {tree.floatVal}"&"\n"
of nkStrLit .. nkTripleStrLit:
result = result & fmt" : {tree.strVal}"&"\n"
of nkSym:
result = result & "\n"
of nkIdent:
result = result & fmt": {tree.ident.s}"&"\n"
else:
result = result & "\n"
for son in tree.sons:
result = result & son.echoTree(indent_level + 1)
echo code.parseString(identCache, configRef).echoTree
Output
====================================================
WHAT? WE DETECTED AN IF_EXPRESSION
====================================================
nkStmtList : (1:0)
nkVarSection : (1:0)
nkIdentDefs : (1:4)
nkIdent : (1:4) : tmp
nkEmpty : (1:8)
nkIfExpr : (1:10)
nkElifExpr : (1:13)
nkInfix : (1:15)
nkIdent : (1:15) : >
nkIntLit : (1:13) : 1
nkIntLit : (1:17) : 2
nkStmtList : (1:21)
nkStrLit : (1:21) : 1 is bigger than 2
nkElseExpr : (1:42)
nkStmtList : (1:49)
nkStrLit : (1:49) : 1 is not bigger than 2
Interesting idea.
I don't know how to make that work though. Have you read the tree sitter docs on creating parsers?
The src/parser.c
is completely generated from the grammer.js
file (using the tree sitter cli). I don't know of any interface to insert things at runtime into parser.c
, but there is src/scanner.cc
which offers more fine grained control over parsing than the DSL in grammar.js
.
Theoretically you could import the nim compiler library as c code or cpp code in the src/scanner.cc
. However the way, that the scanner (and probably parser) works is character by character and I don't know how that plays with the nim compiler library.
To give an example, currently the triplestr_lit
is done in the scanner.cc
, or at least the content and the ending quotes.
https://github.com/aMOPel/tree-sitter-nim/blob/main/grammar.js#L1183
It works like this:
In the grammar.js
, we match a triplestr_lit
if we find the """
followed by
_multi_string_content
rules and a _multi_string_end
rule. Those are done in the src/scanner.cc
here:
https://github.com/aMOPel/tree-sitter-nim/blob/main/src/scanner.cc#L147
and the way the API works is character by character. You can use
lexer->lookahead
to look at the next char,
advance(lexer)
to match the next char and go 1 char forward, and
skip(lexer)
to not match the next char and go 1 char forward.
(there is also mark_end
)
That is pretty much the whole API. So I don't really know how to make this work with the nim compiler lib, but frankly I never used it, so maybe you have an idea.
I would be curious about the size of the parser, when you would to import the nim compiler.