rules_python icon indicating copy to clipboard operation
rules_python copied to clipboard

Gazelle: simplify parser using tree-sitter

Open sluongng opened this issue 3 years ago • 2 comments

🚀 feature request

Relevant Rules

rules_python's gazelle extension

Description

Currently the gazelle extension has to go through a lot of loops to extract Import statements from Python source code in order to build up the list of dependencies:

generate.go
--> parse.go
--> parse.py

Not only that we needed to maintain the parsing logic in 2 different languages, we also have to maintain how the communicate back and forth with a particular json schema.

Describe the solution you'd like

By leveraging https://github.com/smacker/go-tree-sitter/tree/master/python, which is a Go binding for the tree-sitter-python parser, we can write the entire parser logic in Go with tree-sitter query syntax.

I suspect that not having to shell out to python would not only simplify the code but also improve in parser performance.

A POC for the Gazelle Starlark extension using tree-sitter-python could be found here https://github.com/bazelbuild/bazel-gazelle/issues/1189#issuecomment-1073003354

Describe alternatives you've considered

The current implementation is an alternative.

sluongng avatar Mar 31 '22 06:03 sluongng

go-tree-sitter does not appear to have bazel build configuration set up. Given that it's using C quite heavily, I wouldn't be super-confident that go_repository would do the right thing setting those up in all cases, so that would probably be a prerequisite.

I'd also suggest as a potential alternative using https://github.com/go-python/gpython, which has the significant advantage of being written entirely in Go. Unfortunately it also has the significant disadvantage of only supporting python 3.4 syntax. I have a branch with python 3.8 syntax support which our team uses for our python rules generator thing (which we use instead of gazelle for reasons I don't want to get side-tracked with here), though upstreaming it isn't going to be easy since gpython wants to actually work as an interpreter, not just a parser, and I don't have the time or motivation to make much beyond the parser actually work. But the functionality for parsing an AST works fine, at least for our purposes.

adam-azarchs avatar Apr 29 '22 22:04 adam-azarchs

go-tree-sitter does not appear to have bazel build configuration set up. Given that it's using C quite heavily, I wouldn't be super-confident that go_repository would do the right thing setting those up in all cases, so that would probably be a prerequisite.

I don't think there is any concern about support of rules_go + CGO for the current go-tree-sitter package. I tested it recently and there was only a small issue with it which the maintainer has swiftly resolved since https://github.com/smacker/go-tree-sitter/issues/62.

I have also created a small POC here https://github.com/bazelbuild/bazel-gazelle/issues/1189#issuecomment-1073003354 where I replaced the Starlark parser with the python parser in the Starlark Gazelle language extension and things went pretty well.

I'd also suggest as a potential alternative using https://github.com/go-python/gpython

Sounds good to me. I think alternative parser implementations for Gazelle Language Extension can co-exist 🤔

The obvious tradeoffs is maintenance of the projects when supporting a newer version of Python to come out. TreeSitter is currently backed by Github and VSCode, so I expect the parser will be updated swiftly when a new syntax get introduced in the future.

I am not familiar enough with go-python project to compare it's level of support vs tree-sitter 🤔

sluongng avatar May 04 '22 11:05 sluongng

This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!

github-actions[bot] avatar Dec 12 '22 22:12 github-actions[bot]

This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"

github-actions[bot] avatar Jan 12 '23 22:01 github-actions[bot]