htsparse
                                
                                 htsparse copied to clipboard
                                
                                    htsparse copied to clipboard
                            
                            
                            
                        Compiled and wrapped tree-sitter grammars
#+title: readme
#+property: header-args:nim+ :flags -d:plainStdout --hints:off
#+property: header-args:nim
This package provides auto-generated wrappers for multiple tree-sitter grammars, - cpp, java, lua, js, latex, clojure etc (for the full list see [[https://github.com/haxscramper/htsparse/tree/master/src/htsparse][htsparse module list]]). Each wrapper has two versions - ~core_only~ (does not depend on most of the hmisc features and only imports [[https://haxscramper.github.io/hmisc/hmisc/wrappers/treesitter_core.html][treesitter_core]] from hmisc) and more feature-full one.
"core_only" wrapper provides a simple interfaces to the tree-sitter - node is wrapped in the ~distinct~ type and helper procedures for ~.kind~ are provided. In order to read and analyze the nodes you can use procedures defined in the ~treesitter_core~, such as ~[]~ (get idx-th named subnode), ~{}~ (get any idx-th subnode) and ~items~ iterator.
"full" wrapper proviedes an additional convenience layer on top of the bare tree-sitter, by rewriting ~distinct~ tree into much more useable structure that supports ~[]~ for named fields (~node["type"]~), stores pointer to the base string (and has proper ~.strVal()~ call implemented, without the need to pass original string everywhere), and has more sophisticated ~.treeRepr()~ implementation that is very useuful for understanding the parsed AST.
Both "full" and "core only" versions include seveal helper types, consts and procs:
- node kinds :: enum with all possible node kinds. Tree-sitter has two
different node types - named and unnamed, with named nodes further
subdivided into regular and [[https://tree-sitter.github.io/tree-sitter/creating-parsers#hiding-rules][hidden]]. All of these types are listed in the
single enum. Fields that end with ~Tok~ correspond to the token nodes,
ones that have ~Hid~ in name are "hidden" (regular parser won't show
these nodes in the resulting AST).
- Full list of token nodes can be accessed via ~HiddenKinds~ constant 
- List of tokens kinds is stored in ~TokenKinds~ 
- Most of the operations hide token nodes by deafult (~.len()~, ~[]~,
~items()~ etc). In order to access those, ~unnamed = true~ argument can
be used. ~node[, true]~ has a shorthand version - ~{}~ operator. In most cases this is not necessary, except for maybe various operators that used literal tokens - to properly iterate subnodes of ~1 + 2~ you probably need to use ~for sub in items(node, true)~ 
 
- Full list of token nodes can be accessed via ~
- allowed subnodes :: Short list of the allowed node kinds for each subnode
is stored in the ~AllowedSubnodes~ const 
- original grammar :: Most of the original grammar production rules are
copied into the ~Grammar~ variable. These rules correspond to the original productions in the ~grammar.js~ and might be used to generate new source code from the AST. 
In addition to the shared features, "full" vesion also includes helpers to
- 
parse text :: ~parse String~ - helper proc to convert ~string~ into rewritten node. Node the ~unnamed: bool = false~ argument - by default only named nodes are rewritten, and resulting tokens are discareded. To get full tree rewrite with tokens intact set this argument to true, or use ~.getTs()~ for the rewritten node to access the original tree-sitter node. 
- 
agda 
- 
bash 
- 
c 
- 
clojure 
- 
common.nim 
- 
cpp 
- 
csharp 
- 
css 
- 
dart 
- 
elisp 
- 
embeddedTemplate 
- 
eno 
- 
fennel 
- 
go 
- 
graphql 
- 
html 
- 
java 
- 
js 
- 
julia 
- 
kotlin 
- 
latex 
- 
lua 
- 
make 
- 
nix 
- 
php 
- 
python 
- 
regex 
- 
ruby 
- 
rust 
- 
scala 
- 
systemrdl 
- 
systemVerilog 
- 
toml 
- 
vhdl 
- 
zig 
** Installation and setup
#+begin_src sh nimble install htsparse #+end_src
** Links
- [[https://nimble.directory/pkg/htsparse][nimble package]]
- [[https://github.com/haxscramper/htsparse][github]]
- [[https://haxscramper.github.io/htsparse/src/htsparse.html][API documentation]]
** Usage
#+begin_src nim :exports both import htsparse/cpp/cpp
let str = """ int main () { std::cout << "Hello world"; } """
echo parseCppString(str).treeRepr() #+end_src
#+RESULTS: #+begin_example TranslationUnit 0:0-3:0 [0] FunctionDefinition 0:0-2:1 [0] PrimitiveType <type(0)> 0:0..3 int [1] FunctionDeclarator <declarator(1)> 0:4..11 [0] Identifier <declarator(0)> 0:4..8 main [1] ParameterList <parameters(1)> 0:9..11 () [2] CompoundStatement <body(2)> 0:12-2:1 [0] ExpressionStatement 1:2..29 [0] BinaryExpression 1:2..28 [0] QualifiedIdentifier <left(0)> 1:2..11 [0] NamespaceIdentifier <scope(0)> 1:2..5 std [1] Identifier <name(1)> 1:7..11 cout [1] StringLiteral <right(1)> 1:15..28 "Hello world" #+end_example
** Tree-sitter library
You need to have tree-sitter runtime library installed. For arch linux it can be done by installing [[https://www.archlinux.org/packages/community/x86_64/tree-sitter/][tree-sitter]], otherwise you can install it manually:
#+begin_src sh wget https://github.com/tree-sitter/tree-sitter/archive/0.20.0.tar.gz tar -xvf 0.20.0.tar.gz && cd tree-sitter-0.20.0 sudo make install PREFIX=/usr #+end_src