semantic icon indicating copy to clipboard operation
semantic copied to clipboard

R support

Open jimhester opened this issue 5 years ago • 7 comments

R is a widely used, growing language often used in Data Science and Statistics.

While it does not have a published formal specification, there is a draft specification that describes lexing and parsing the language.

In the most widely used implementation the parsing is done with a bison parser defined in gram.y.

The lexing rules for R are somewhat complex, but the parsing is relatively straightforward, as generally everything is an expression.

It would very beneficial to the R community to have support for R in semantic!

jimhester avatar Nov 14 '19 14:11 jimhester

A first step would be creating a tree sitter grammar for R. See more in their documentation on creating new parsers. After that, the hardest part begins - implementing the semantic part: https://github.com/github/semantic/blob/master/docs/adding-new-languages.md

XVilka avatar Nov 15 '19 14:11 XVilka

Definitely start with a tree-sitter parser.

After that, the hardest part begins - implementing the semantic part

This is about to get much, much easier as we are almost entirely automating the semantic part. My advise would be to get the tree-sitter grammar in good shape and then check in again with the team so see if we can use the new path for supporting languages in semantic.

tclem avatar Nov 18 '19 17:11 tclem

Definitely start with a tree-sitter parser.

After that, the hardest part begins - implementing the semantic part

This is about to get much, much easier as we are almost entirely automating the semantic part. My advise would be to get the tree-sitter grammar in good shape and then check in again with the team so see if we can use the new path for supporting languages in semantic.

Is the new path for supporting languages available soon?

2yz avatar Dec 03 '19 05:12 2yz

Is the new path for supporting languages available soon?

I don't have a specific timeline to give you, but you can see an example for java and python of what's required to generate code from the new node-types.json. Obviously part of the work here is to better surface our documentation.

tclem avatar Dec 04 '19 13:12 tclem

Link to the documentation has moved to https://github.com/github/semantic/blob/master/docs/codegen.md

XVilka avatar May 09 '20 08:05 XVilka

Just a small update, I have begun work on a tree sitter parser for R (https://github.com/jimhester/tree-sitter-r)

Only spent a few days on it, but it is already fairly functional, so I could start looking into semnatic support in the near future.

jimhester avatar Oct 30 '20 21:10 jimhester

The tree sitter parser is now in pretty good shape. I have moved it to https://github.com/r-lib/tree-sitter-r and sent a PR to https://github.com/tree-sitter/haskell-tree-sitter/pull/295. Once that is merged I guess it needs to be pushed to hackage so it can be used in semantic.

Now that https://github.com/github/semantic/pull/577 has been merged I am a little unclear what the next steps within semantic neend to be. If someone could clarify that for me I would be happy to work on it!

jimhester avatar Dec 07 '20 16:12 jimhester