cairo icon indicating copy to clipboard operation
cairo copied to clipboard

dev: tree-sitter formal syntax definition?

Open xJonathanLEI opened this issue 2 years ago • 15 comments

The old cairo-lang repo comes with an EBNF definition of the Cairo syntax, which makes it easy to build and maintain the treesitter grammar, along with other benefits.

Maybe I missed it, but it looks like the new repo doesn't provide such a definition. Would be nice it exists. I'm trying to build the Cairo 1.0 treesitter grammar so that syntax hightlighting, folding, etc. work in editors that support it (Helix, nvim, etc.).

xJonathanLEI avatar Nov 25 '22 15:11 xJonathanLEI

I was hoping most editors could use the syntax highlighting that comes from the language server. Perhaps I was too optimistic here, but I really wanted to only have one parser (note that we have a hand written parser).

Anyway, we have the cairo_spec.rs file that determines the structure of the AST. Also, we plan to have an ebnf auto generated from the reference docs, but it won't always be up to date I guess.

spapinistarkware avatar Nov 25 '22 18:11 spapinistarkware

I would argue they're for different use cases. Treesitter is a widely adopted format (iirc GitHub highlights sources using treesitter), and is very useful when the lsp isn't available (web, users don't want to install, or simply unsupported arch).

Cairo already has a community maintained ts grammar here. It's just for pre-cairo-1, and I want to help update it.

It's good to know the spec is coming!

xJonathanLEI avatar Nov 26 '22 01:11 xJonathanLEI

Yeah, @spapinistarkware I think we will need following separate grammars implementations:

  1. Compiler one we have
  2. Tree sitter as @xJonathanLEI said
  3. TextMate bundle which is present in VSCode format in lsp plugin. This format is also used by SublimeText and many other editors afaik
  4. IntelliJ Grammar-Kit syntax def for possible future IntelliJ plugins

mkaput avatar Nov 26 '22 11:11 mkaput

Hey,

I can confirm that a tree-sitter grammar would be awesome (and I would prioritize it over the others).

  1. It's going to be great for toolbuilders building static analysis stuff (I'll use it for sure to build some toolz).
  2. GitHub (advanced) code navigation and semantic highlighting are based on tree-sitter.
  3. also you might even consider adapting the lsp to use tree-sitter for it's many benefits like it's blazing fast incremental parsing and https://crates.io/crates/tree-sitter-stack-graphs 😱 .
  4. tree-sitter grammars can generally be version agnostic (accept multiple versions of the same language), making it easier to build downstream tools. My team builds https://github.com/ConsenSys/solc-typed-ast which provides the same comfort on top of the (all ways changing) solidity AST and tool builders seem to love it 🚀 .

It would be incredibly awesome if there was an officially maintained tree-sitter grammar.

I'm happy to contribute where possible! 🙌

JoranHonig avatar Nov 28 '22 15:11 JoranHonig

For static analysis tools, we really hope builders will use our compiler as a library for these. It's designed with API is mind, so it is simple to integrate with.

As for lsp, our parser is pretty fast actually, and like i said before, it is important for us to have a hand written parser to allow for smart recovery and to supply as much semantic info as possible.

spapinistarkware avatar Nov 28 '22 16:11 spapinistarkware

For static analysis tools, we really hope builders will use our compiler as a library for these. It's designed with API is mind, so it is simple to integrate with.

Do you plan to keep the parser of the compiler backwards compatible with all versions > 1.0?

A custom rust parser is def useful (and I'll likely use it myself), but it doesn't fit all static analysis usecases (ofc I can appreciate the desire to not maintain tons of grammars for the same language just bc there is some feature one of them doesn't support 😄 ). One of the things I really like about tree-sitter is the range of tools/ frameworks that already integrate with it.

An example is semgrep, which leverages tree-sitter to make it incredibly easy to write powerful static analysis rules / detectors. I know a couple of auditors that use this a lot.

(I'll stop shilling tree-sitter now 😅 )

As for lsp, our parser is pretty fast actually, and like i said before, it is important for us to have a hand written parser to allow for smart recovery and to supply as much semantic info as possible.

The lsp is all up to you 😄 , I just wanted to mention that tree-sitter has some nice goodies that you might like 🙌 .

JoranHonig avatar Nov 28 '22 18:11 JoranHonig

Hi, I'm the author of helix and the maintainer of tree-sitter-cairo 👋🏻

I was hoping most editors could use the syntax highlighting that comes from the language server.

I'm not a huge fan of the LSP highlighting spec because it results in a bunch of heavy back-and-forth traffic per keypress. Helix uses the grammar to also calculate indentation, text objects and a whole bunch of other smart features.

As @xJonathanLEI mentioned, the biggest benefit of an official tree-sitter grammar would be better language support on GitHub itself: currently the Atom grammar is used but this could be swapped out. In the future additional queries could be added that would support code navigation/go to definition on Github. As an example Elixir built an official grammar for this purpose.

I've been hoping to upstream my grammar so that I don't have to maintain it as cairo goes through breaking changes. It's going to be a headache to do so without a formal EBNF grammar and since I'm a third-party dev there's usually a gap after a Cairo release where I need to catch-up with the grammar.

archseer avatar Dec 01 '22 07:12 archseer

My idea is to first have a working grammar for Cairo 1.0, in this repo, with good testing to make sure it is in line with our parser, then we could make sure it never breaks (in CI)

Maybe if you or some other community member would be willing to make this happen, then we could make sure to maintain ot with our parser.

But this needs to be discussed with the team, and I think we will want to wait with this until the first release at least (when we have less pressure).

@orizi ?

spapinistarkware avatar Dec 01 '22 07:12 spapinistarkware

Sure, sounds good! I do recommend keeping a separate repository since most build systems (and github's linguist) end up pulling the grammar via a git repository, so having it split off from the main codebase makes it a smaller clone. I'm willing to transfer the repository under starkware-libs if that's OK with the team.

archseer avatar Dec 02 '22 11:12 archseer

I'm willing to transfer the repository under starkware-libs if that's OK with the team.

It might make sense to keep this parser separate (maybe call it cairo-legacy), since v1.0 seems to diverge to such an extent that you might almost argue it's a new language.

I'll try to build an initial parser for v1.0 based on what I can learn from the rust parser here: https://github.com/JoranHonig/tree-sitter-cairo


@spapinistarkware still somewhat related to this thread I was wondering if you plan to keep the parser of the compiler backwards compatible with all versions > 1.0?

JoranHonig avatar Dec 08 '22 13:12 JoranHonig

Allright, it works on all the examples in the repository.

A major thing that I haven't found out yet is all the different literal types.

I'll also do another pass over the parser to add nice node types, fields aliases etc, trying to stick as close as possible to the spec's structure here:

https://github.com/starkware-libs/cairo/blob/main/crates/syntax_codegen/src/cairo_spec.rs

JoranHonig avatar Dec 09 '22 10:12 JoranHonig

IntelliJ Grammar-Kit syntax def for possible future IntelliJ plugins

Yup, I'm working on a Cairo plugin for IntelliJ and it would be great to have a BNF grammar in this repo. At the moment, I'm stitching things here and there (between Rust's own BNF and the grammar from AVNU-Labs also produced by tree-sitter).

kasteph avatar Oct 31 '23 20:10 kasteph

Allright, it works on all the examples in the repository.

A major thing that I haven't found out yet is all the different literal types.

I'll also do another pass over the parser to add nice node types, fields aliases etc, trying to stick as close as possible to the spec's structure here:

https://github.com/starkware-libs/cairo/blob/main/crates/syntax_codegen/src/cairo_spec.rs

Alright, since I saw that your repo got outdated by some recent changes, I got into this weird project of automatically parsing Cairo's syntax (written in Rust) with tree-sitter, in order to produce a tree-sitter grammar for Cairo.

For now there are still quite a few errors, which seems to stem from the parsing of type clauses (they seem to allow any expression inside them, which therefore allows something like A>B, thus preventing tree-sitter from correctly parsing fn foo() -> Vec<A> {...}). From the comments in cairo_spec.rs, it seems that this "type clause expression" is a known issue however.

Juul-Mc-Goa avatar Oct 31 '23 21:10 Juul-Mc-Goa

@JoranHonig @Juul-Mc-Goa does your Cairo tree sitter support Cairo2? I am looking for a cairo tree sitter to use for a Zed extension

okhaimie-dev avatar Apr 10 '24 07:04 okhaimie-dev

@okhaimie-dev my tree-sitter parser is a bit outdated, it's on my todo list to revisit it and get it up to date with the latest and greatest in cairo

JoranHonig avatar Apr 12 '24 12:04 JoranHonig

https://github.com/starkware-libs/tree-sitter-cairo

0xLucqs avatar Jun 10 '24 14:06 0xLucqs