rules_python
rules_python copied to clipboard
Gazelle: simplify parser using tree-sitter
🚀 feature request
Relevant Rules
rules_python's gazelle extension
Description
Currently the gazelle extension has to go through a lot of loops to extract Import statements from Python source code in order to build up the list of dependencies:
generate.go
--> parse.go
--> parse.py
Not only that we needed to maintain the parsing logic in 2 different languages, we also have to maintain how the communicate back and forth with a particular json schema.
Describe the solution you'd like
By leveraging https://github.com/smacker/go-tree-sitter/tree/master/python, which is a Go binding for the tree-sitter-python parser, we can write the entire parser logic in Go with tree-sitter query syntax.
I suspect that not having to shell out to python would not only simplify the code but also improve in parser performance.
A POC for the Gazelle Starlark extension using tree-sitter-python could be found here https://github.com/bazelbuild/bazel-gazelle/issues/1189#issuecomment-1073003354
Describe alternatives you've considered
The current implementation is an alternative.
go-tree-sitter does not appear to have bazel build configuration set up. Given that it's using C quite heavily, I wouldn't be super-confident that go_repository would do the right thing setting those up in all cases, so that would probably be a prerequisite.
I'd also suggest as a potential alternative using https://github.com/go-python/gpython, which has the significant advantage of being written entirely in Go. Unfortunately it also has the significant disadvantage of only supporting python 3.4 syntax. I have a branch with python 3.8 syntax support which our team uses for our python rules generator thing (which we use instead of gazelle for reasons I don't want to get side-tracked with here), though upstreaming it isn't going to be easy since gpython wants to actually work as an interpreter, not just a parser, and I don't have the time or motivation to make much beyond the parser actually work. But the functionality for parsing an AST works fine, at least for our purposes.
go-tree-sitter does not appear to have bazel build configuration set up. Given that it's using C quite heavily, I wouldn't be super-confident that go_repository would do the right thing setting those up in all cases, so that would probably be a prerequisite.
I don't think there is any concern about support of rules_go + CGO for the current go-tree-sitter package. I tested it recently and there was only a small issue with it which the maintainer has swiftly resolved since https://github.com/smacker/go-tree-sitter/issues/62.
I have also created a small POC here https://github.com/bazelbuild/bazel-gazelle/issues/1189#issuecomment-1073003354 where I replaced the Starlark parser with the python parser in the Starlark Gazelle language extension and things went pretty well.
I'd also suggest as a potential alternative using https://github.com/go-python/gpython
Sounds good to me. I think alternative parser implementations for Gazelle Language Extension can co-exist 🤔
The obvious tradeoffs is maintenance of the projects when supporting a newer version of Python to come out. TreeSitter is currently backed by Github and VSCode, so I expect the parser will be updated swiftly when a new syntax get introduced in the future.
I am not familiar enough with go-python project to compare it's level of support vs tree-sitter 🤔
This issue has been automatically marked as stale because it has not had any activity for 180 days. It will be closed if no further activity occurs in 30 days. Collaborators can add an assignee to keep this open indefinitely. Thanks for your contributions to rules_python!
This issue was automatically closed because it went 30 days without a reply since it was labeled "Can Close?"