Explore whether stack graphs may be useful in this tool
Not really a feature request per se, but came across this project after it was mentioned in another issue (Ref), and figured I would create an issue to share the same info here in case it's useful.
I notice that you're already using tree-sitter, so it may not be a lot of extra effort to make use of tree-sitter-graph and/or the stack-graphs project (eg. tree-sitter-stack-graphs-javascript, etc)
From watching the demo video on linkedin + briefly skimming this repo, it looks like there might be some useful crossovers, and it may allow you to add support for a whole bunch more languages without needing to re-invent the wheel to do so.
A few notes/links/references I recently collated RE: stack graphs + related libs:
Stack Graphs (an evolution of Scope Graphs) sound like they could be really interesting/useful with regards to code navigation, symbol mapping, etc. Perhaps we could use them for module identification, or variable/function identifier naming stabilisation or similar?
- https://github.blog/changelog/2024-03-14-precise-code-navigation-for-typescript-projects/
Precise code navigation is now available for all TypeScript repositories. Precise code navigation gives more accurate results by only considering the set of classes, functions, and imported definitions that are visible at a given point in your code.
Precise code navigation is powered by the stack graphs framework. You can read about how we use stack graphs for code navigation and visit the stack graphs definition for TypeScript to learn more.
- https://github.blog/2021-12-09-introducing-stack-graphs/
Introducing stack graphs
Precise code navigation is powered by stack graphs, a new open source framework we’ve created that lets you define the name binding rules for a programming language using a declarative, domain-specific language (DSL). With stack graphs, we can generate code navigation data for a repository without requiring any configuration from the repository owner, and without tapping into a build process or other CI job.
- LOTS of interesting stuff in this post..
As part of developing stack graphs, we’ve added a new graph construction language to Tree-sitter, which lets you construct arbitrary graph structures (including but not limited to stack graphs) from parsed CSTs. You use stanzas to define the gadget of graph nodes and edges that should be created for each occurrence of a Tree-sitter query, and how the newly created nodes and edges should connect to graph content that you’ve already created elsewhere.
- https://github.com/tree-sitter/tree-sitter-graph
tree-sitter-graphThe tree-sitter-graph library defines a DSL for constructing arbitrary graph structures from source code that has been parsed using tree-sitter.- https://marketplace.visualstudio.com/items?itemName=tree-sitter.tree-sitter-graph
tree-sitter-graphsupport for VS Code This language extension for VS Code provides syntax support fortree-sitter-graphfiles.Why aren’t we using the Language Server Protocol (LSP) or Language Server Index Format (LSIF)?
To dig even deeper and learn more, I encourage you to check out my Strange Loop talk and the
stack-graphscrate: our open source Rust implementation of these ideas.
- https://github.com/github/stack-graphs
Stack graphs The crates in this repository provide a Rust implementation of stack graphs, which allow you to define the name resolution rules for an arbitrary programming language in a way that is efficient, incremental, and does not need to tap into existing build or program analysis tools.
- https://docs.rs/stack-graphs/latest/stack_graphs/
- https://github.com/github/stack-graphs/tree/main/languages
This directory contains stack graphs definitions for specific languages.
- https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-javascript
tree-sitter-stack-graphsdefinition for JavaScript This project definestree-sitter-stack-graphsrules for JavaScript using thetree-sitter-javascriptgrammar.The command-line program for
tree-sitter-stack-graphs-javascriptlets you do stack graph based analysis and lookup from the command line.
cargo install --features cli tree-sitter-stack-graphs-javascripttree-sitter-stack-graphs-javascript index SOURCE_DIRtree-sitter-stack-graphs-javascript status SOURCE_DIRtree-sitter-stack-graphs-javascript query definition SOURCE_PATH:LINE:COLUMN- https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-typescript
tree-sitter-stack-graphsdefinition for TypeScript This project definestree-sitter-stack-graphsrules for TypeScript using thetree-sitter-typescriptgrammar.The command-line program for
tree-sitter-stack-graphs-typescriptlets you do stack graph based analysis and lookup from the command line.- https://dcreager.net/talks/2021-strange-loop/
- Redirects to https://dcreager.net/talks/stack-graphs/
Incremental, zero-config Code Navigation using stack graphs.
In this talk I’ll describe stack graphs, which use a graphical notation to define the name binding rules for a programming language. They work equally well for dynamic languages like Python and JavaScript, and for static languages like Go and Java. Our solution is fast — processing most commits within seconds of us receiving your push. It does not require setting up a CI job, or tapping into a project-specific build process. And it is open-source, building on the tree-sitter project’s existing ecosystem of language tools.
- https://www.youtube.com/watch?v=l2R1PTGcwrE
"Incremental, zero-config Code Nav using stack graphs" by Douglas Creager
- https://media.dcreager.net/dcreager-strange-loop-2021-slides.pdf
- https://media.dcreager.net/dcreager-2022-ucsc-lsd-slides.pdf
- https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github
GitHub has developed two code navigation approaches based on the open source tree-sitter and stack-graphs library:
- Search-based - searches all definitions and references across a repository to find entities with a given name
- Precise - resolves definitions and references based on the set of classes, functions, and imported definitions at a given point in your code
To learn more about these approaches, see "Precise and search-based navigation."
- https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation
Precise and search-based navigation Certain languages supported by GitHub have access to precise code navigation, which uses an algorithm (based on the open source stack-graphs library) that resolves definitions and references based on the set of classes, functions, and imported definitions that are visible at any given point in your code. Other languages use search-based code navigation, which searches all definitions and references across a repository to find entities with a given name. Both strategies are effective at finding results and both make sure to avoid inappropriate results such as comments, but precise code navigation can give more accurate results, especially when a repository contains multiple methods or functions with the same name.
- https://pl.ewi.tudelft.nl/research/projects/scope-graphs/
Scope Graphs | A Theory of Name Resolution
Scope graphs provide a new approach to defining the name binding rules of programming languages. A scope graph represents the name binding facts of a program using the basic concepts of declarations and reference associated with scopes that are connected by edges. Name resolution is defined by searching for paths from references to declarations in a scope graph. Scope graph diagrams provide an illuminating visual notation for explaining the bindings in programs.
Originally posted by @0xdevalias in https://github.com/0xdevalias/chatgpt-source-watch/issues/11
Hi, thanks for the tips and the reading list, I already had read all of them but LSP and LSIF are still pending.
We are definitely looking into the option of using tree-sitter-graphs. I'm currently weighing two options:
-
Integrating 'tree-sitter-graph'. This has several pros like performance enhancement, not reinventing the wheel, and meeting industry standards. The problem is that I'm not sure about the complexity it will introduce to the code, especially handling a CLI integration with the tool.
-
Creating my own implementation of a stack-graph in Python. Keeping everything in-house will simplify the architecture, make maintenance easier, and is a fun challenge 😃 . The downside is obvious: it will be complex and time-consuming to replicate stack-graph logic, and Python doesn't perform as well as Rust.
I think I'm going to explore the integration option, try to make it run, and if it's not too difficult, that's the route I'll go.
I was checking LSP and seems to be a really power full tool. But it does a lot more of what I really need in this stage, because we are focused in give LLM a tool to navigate and understand code base repositories in a easy way. For that I think the best is to implement 'tree-sitter-graph' and later see in what could help LSP.
We are definitely looking into the option of using
tree-sitter-graphs
I think I'm going to explore the integration option, try to make it run, and if it's not too difficult, that's the route I'll go.
@berrazuriz1 I would make sure to look at both tree-sitter-graph (a more generic underlying lib), but also the more specific stack-graphs project (which is built on top of tree-sitter-graph):
- https://github.com/tree-sitter/tree-sitter-graph
- https://github.com/github/stack-graphs
eg.
- https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-python#writing-tsg
-
The stack graph rules are written in tree-sitter-graph. Checkout the examples, which contain self-contained TSG rules for specific language features.
-
The problem is that I'm not sure about the complexity it will introduce to the code, especially handling a CLI integration with the tool.
@berrazuriz1 Yeah, that's fair enough. There are theoretically ways to call/embed rust within python as well I believe, but that may end up being even more complex than shelling out to the CLI
e.g. Some random related links:
- https://www.reddit.com/r/learnrust/comments/uax3l9/which_resources_are_there_to_learn_how_to_call/
- https://saidvandeklundert.net/learn/2021-11-18-calling-rust-from-python-using-pyo3/
- https://github.com/PyO3/pyo3
-
PyO3 Rust bindings for Python, including tools for creating native Python extension modules. Running and interacting with Python code from a Rust binary is also supported.
- https://pyo3.rs/
- https://pyo3.rs/v0.21.1/rust-from-python
-
Using Rust from Python
-
- https://pyo3.rs/v0.21.1/python-from-rust
-
Calling Python in Rust code
-
- https://pyo3.rs/v0.21.1/rust-from-python
-
I was checking LSP and seems to be a really power full tool. But it does a lot more of what I really need in this stage
@berrazuriz1 Yeah, I think LSP is likely not relevant at this stage. I really only included the links there as part of the "why we aren't using" context.
I think running Rust on Python is a good solution. I'm going to give it a try.
Hi @0xdevalias , we just update the package and we ended up implementing the graph using LSP. This gives us much more flexibility and support. Take a look at the Readme and see for yourself!