SWE-agent Explore whether 'smart code search' (AST, stack graphs, scope graphs) would improve performance beyond the current find/grep based approach.

trafficstars

Looking at the current 'search' command that the agent has access to, it seems it just uses grep for searches within a file, and find for searches by filename:

https://github.com/princeton-nlp/SWE-agent/blob/79038832dfda12e29e1a4ab19d3ee3c8f86e2b9d/config/commands/search.sh#L89-L95

https://github.com/princeton-nlp/SWE-agent/blob/79038832dfda12e29e1a4ab19d3ee3c8f86e2b9d/config/commands/search.sh#L145-L150

Then the are the other tools that allow files to be opened/viewed/edited/etc:

https://github.com/princeton-nlp/SWE-agent/blob/main/config/commands/defaults.sh

https://github.com/princeton-nlp/SWE-agent/blob/main/config/commands/edit_linting.sh#L49-L68

It would be interesting to see whether giving some more powerful AST / 'smart' code based search functionality would improve the performance of the agent, particularly on larger/more complex codebases; or tasks that require more complex implementations.

One direction that might be interesting to explore with regards to this is 'stack graphs' / 'scope graphs'. I've included a bunch of references/resources I was exploring today below:

Stack Graphs (an evolution of Scope Graphs) sound like they could be really interesting/useful with regards to code navigation, symbol mapping, etc. Perhaps we could use them for module identification, or variable/function identifier naming stabilisation or similar?

https://github.blog/changelog/2024-03-14-precise-code-navigation-for-typescript-projects/

Precise code navigation is now available for all TypeScript repositories. Precise code navigation gives more accurate results by only considering the set of classes, functions, and imported definitions that are visible at a given point in your code.

Precise code navigation is powered by the stack graphs framework. You can read about how we use stack graphs for code navigation and visit the stack graphs definition for TypeScript to learn more.

https://github.blog/2021-12-09-introducing-stack-graphs/

Introducing stack graphs

Precise code navigation is powered by stack graphs, a new open source framework we’ve created that lets you define the name binding rules for a programming language using a declarative, domain-specific language (DSL). With stack graphs, we can generate code navigation data for a repository without requiring any configuration from the repository owner, and without tapping into a build process or other CI job.

LOTS of interesting stuff in this post..

As part of developing stack graphs, we’ve added a new graph construction language to Tree-sitter, which lets you construct arbitrary graph structures (including but not limited to stack graphs) from parsed CSTs. You use stanzas to define the gadget of graph nodes and edges that should be created for each occurrence of a Tree-sitter query, and how the newly created nodes and edges should connect to graph content that you’ve already created elsewhere.

https://github.com/tree-sitter/tree-sitter-graph

tree-sitter-graph The tree-sitter-graph library defines a DSL for constructing arbitrary graph structures from source code that has been parsed using tree-sitter.

https://marketplace.visualstudio.com/items?itemName=tree-sitter.tree-sitter-graph

tree-sitter-graph support for VS Code This language extension for VS Code provides syntax support for tree-sitter-graph files.

Why aren’t we using the Language Server Protocol (LSP) or Language Server Index Format (LSIF)?

To dig even deeper and learn more, I encourage you to check out my Strange Loop talk and the stack-graphs crate: our open source Rust implementation of these ideas.

https://github.com/github/stack-graphs

Stack graphs The crates in this repository provide a Rust implementation of stack graphs, which allow you to define the name resolution rules for an arbitrary programming language in a way that is efficient, incremental, and does not need to tap into existing build or program analysis tools.

https://docs.rs/stack-graphs/latest/stack_graphs/

https://github.com/github/stack-graphs/tree/main/languages

This directory contains stack graphs definitions for specific languages.

https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-javascript

tree-sitter-stack-graphs definition for JavaScript This project defines tree-sitter-stack-graphs rules for JavaScript using the tree-sitter-javascript grammar.

The command-line program for tree-sitter-stack-graphs-javascript lets you do stack graph based analysis and lookup from the command line.

cargo install --features cli tree-sitter-stack-graphs-javascript

tree-sitter-stack-graphs-javascript index SOURCE_DIR

tree-sitter-stack-graphs-javascript status SOURCE_DIR

tree-sitter-stack-graphs-javascript query definition SOURCE_PATH:LINE:COLUMN

https://github.com/github/stack-graphs/tree/main/languages/tree-sitter-stack-graphs-typescript

tree-sitter-stack-graphs definition for TypeScript This project defines tree-sitter-stack-graphs rules for TypeScript using the tree-sitter-typescript grammar.

The command-line program for tree-sitter-stack-graphs-typescript lets you do stack graph based analysis and lookup from the command line.

https://dcreager.net/talks/2021-strange-loop/

Redirects to https://dcreager.net/talks/stack-graphs/

Incremental, zero-config Code Navigation using stack graphs.

In this talk I’ll describe stack graphs, which use a graphical notation to define the name binding rules for a programming language. They work equally well for dynamic languages like Python and JavaScript, and for static languages like Go and Java. Our solution is fast — processing most commits within seconds of us receiving your push. It does not require setting up a CI job, or tapping into a project-specific build process. And it is open-source, building on the tree-sitter project’s existing ecosystem of language tools.

https://www.youtube.com/watch?v=l2R1PTGcwrE

"Incremental, zero-config Code Nav using stack graphs" by Douglas Creager

https://media.dcreager.net/dcreager-strange-loop-2021-slides.pdf

https://media.dcreager.net/dcreager-2022-ucsc-lsd-slides.pdf

https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github

GitHub has developed two code navigation approaches based on the open source tree-sitter and stack-graphs library:

Search-based - searches all definitions and references across a repository to find entities with a given name

Precise - resolves definitions and references based on the set of classes, functions, and imported definitions at a given point in your code

To learn more about these approaches, see "Precise and search-based navigation."

https://docs.github.com/en/repositories/working-with-files/using-files/navigating-code-on-github#precise-and-search-based-navigation

Precise and search-based navigation Certain languages supported by GitHub have access to precise code navigation, which uses an algorithm (based on the open source stack-graphs library) that resolves definitions and references based on the set of classes, functions, and imported definitions that are visible at any given point in your code. Other languages use search-based code navigation, which searches all definitions and references across a repository to find entities with a given name. Both strategies are effective at finding results and both make sure to avoid inappropriate results such as comments, but precise code navigation can give more accurate results, especially when a repository contains multiple methods or functions with the same name.

https://pl.ewi.tudelft.nl/research/projects/scope-graphs/

Scope Graphs | A Theory of Name Resolution

Scope graphs provide a new approach to defining the name binding rules of programming languages. A scope graph represents the name binding facts of a program using the basic concepts of declarations and reference associated with scopes that are connected by edges. Name resolution is defined by searching for paths from references to declarations in a scope graph. Scope graph diagrams provide an illuminating visual notation for explaining the bindings in programs.

Originally posted by @0xdevalias in https://github.com/0xdevalias/chatgpt-source-watch/issues/11

Apr 04 '24 03:04 0xdevalias

Hey, this could be a very interesting enhancement. At Blar we are trying to create a traversable graph for agents to use similar to what you suggested. I will be trying to integrate our solution in the following days and get results to see if there is any improvement :)

You can check our repo here we open-sourced the project and are currently looking for feedback and use cases.

Here's a small demo explaining what we are trying to achieve.

Sorry for the self-promotion :)

Apr 04 '24 14:04 josem7

At Blar we are trying to create a traversable graph for agents to use similar to what you suggested. I will be trying to integrate our solution in the following days and get results to see if there is any improvement :)

@josem7 It would definitely be interesting to hear how the integration goes, and specifically if it does improve things.

I watched your demo video + had a quick skim through your repo, and I like the direction you're exploring; definitely sounds like it could be a valuable/useful one.

Some of the libs you use may be useful in this project as well, of particular note (off the top of my head):

https://github.com/langchain-ai/langchain
- LangChain is a framework for developing applications powered by large language models (LLMs).
https://github.com/run-llama/llama_index
- LlamaIndex is a data framework for your LLM applications
- https://docs.llamaindex.ai/en/stable/
  - LlamaIndex is a data framework for LLM-based applications which benefit from context augmentation. Such LLM systems have been termed as RAG systems, standing for "Retrieval-Augemented Generation". LlamaIndex provides the essential abstractions to more easily ingest, structure, and access private or domain-specific data in order to inject these safely and reliably into LLMs for more accurate text generation.
- https://docs.llamaindex.ai/en/stable/examples/
- https://github.com/run-llama/llama_index/tree/main/llama-index-packs/llama-index-packs-code-hierarchy
  - CodeHierarchyAgentPack
  - The CodeHierarchyAgentPack is useful to split long code files into more reasonable chunks, while creating an agent on top to navigate the code. What this will do is create a "Hierarchy" of sorts, where sections of the code are made more reasonable by replacing the scope body with short comments telling the LLM to search for a referenced node if it wants to read that context body.
    
    Nodes in this hierarchy will be split based on scope, like function, class, or method scope, and will have links to their children and parents so the LLM can traverse the tree.
  - https://github.com/run-llama/llama_index/tree/main/llama-index-packs/llama-index-packs-code-hierarchy#repo-maps
    - Repo Maps The pack contains a CodeHierarchyKeywordQueryEngine that uses a CodeHierarchyNodeParser to generate a map of a repository's structure and contents. This is useful for the LLM to understand the structure of a codebase, and to be able to reference specific files or directories.
  - https://github.com/run-llama/llama_index/tree/main/llama-index-packs/llama-index-packs-code-hierarchy#future
    - I'm considering adding all the languages from aider by incorporating .scm files instead of _SignatureCaptureType, _SignatureCaptureOptions, and _DEFAULT_SIGNATURE_IDENTIFIERS
      - https://github.com/paul-gauthier/aider
        
        Aider is a command line tool that lets you pair program with GPT-3.5/GPT-4, to edit code stored in your local git repository. Aider will directly edit the code in your local source files, and git commit the changes with sensible commit messages. You can start a new project or work with an existing git repo. Aider is unique in that it lets you ask for changes to pre-existing, larger codebases.
        
        https://aider.chat/
        
        https://aider.chat/docs/repomap.html
        
        Building a better repository map with tree sitter

Apr 05 '24 01:04 0xdevalias

Due to the complexity of implementing and benchmarking this idea (to verify it actually works) this will not be a priority in the near-term, but if anyone implements and shows us numbers on the dev set we would be open to considering it.

Apr 05 '24 04:04 ofirpress

Does not look like its an important functionality, I would recommend to close it.

Apr 05 '24 04:04 smith-co

This would just add complexity for no good reason. And in the broader perspective, would not be sufficiently helpful. I think this issue should be closed.

Apr 05 '24 04:04 code2graph

Due to the complexity of implementing and benchmarking this idea (to verify it actually works) this will not be a priority in the near-term

@ofirpress Totally valid/understandable :)

Does not look like its an important functionality

@smith-co Curious what your reasoning/basis for that conclusion is..?

This would just add complexity for no good reason. And in the broader perspective, would not be sufficiently helpful.

@code2graph Curious what your reasoning/basis for that conclusion is..?

This seems to be aligned to how some other agents have chosen to go, eg.

https://aider.chat/2023/10/22/repomap.html
- Building a better repository map with tree sitter

Where they saw it as an improvement on their older method:

https://aider.chat/2023/05/25/ctags.html
- Improving GPT-4's codebase understanding with ctags

I understand it not being a current priority; but to discount the concept entirely (particularly without reasoning beyond seemingly personal opinion) seems counterintuitive to getting the best agent/outcome here.

Apr 05 '24 06:04 0xdevalias

Hey, thanks for the feedback @0xdevalias!

Yesterday I played around with integrating a graph-based search into the agent (using Blar). The agent managed to use the tool in a good way easily jumping between code to get to the root of the problem. I still need to tweak it a bit so the tool closely resembles the other tools and doesn't affect the workflow as much. After these tweaks, I'll try to run a small-scale benchmark and see if the results look promising.

I'm also curious about your point of view @smith-co and @code2graph, it's true that it adds a bit more complexity but it's not much compared to the possible benefits. In my point of view, it's giving the agent the ability to jump between codes in a more precise manner, similar to using CTRL+click on Vscode to find the function being called.

For me the benefits are 2:

Better precision when searching for surrounding code, maybe a function is called the same in different files, this way you make sure to access the correct function.
Better context/cost management, instead of skimming through files and functions to find what you need you go directly to the function you are looking for

This comes at the cost of having to create the graph, our current approach is a bit inefficient as it's currently more in a proof-of-concept phase but we'll work on improving it with time. Also currently we use Neo4j as our "search and DB engine", in the future we are looking at ways to save this data in memory for a faster implementation. Let me know what you guys think.

I'm always open to receiving feedback and discussing :)

PD: What do you guys think of instead of jumping between individual functions, we provide all surrounding functions as context?

Apr 05 '24 13:04 josem7

Yesterday I played around with integrating a graph-based search into the agent (using Blar). The agent managed to use the tool in a good way easily jumping between code to get to the root of the problem. I still need to tweak it a bit so the tool closely resembles the other tools and doesn't affect the workflow as much. After these tweaks, I'll try to run a small-scale benchmark and see if the results look promising.

@josem7 Nice! Keen to hear how the tweaked version goes on the benchmark!

In my point of view, it's giving the agent the ability to jump between codes in a more precise manner, similar to using CTRL+click on Vscode to find the function being called.

This aligns with my view as well. The way I see it, it brings the way the agent is accessing the files even closer to how a real dev would do it in a modern IDE/dev environment; like a more precise/contextually relevant version of string matching in files (and the edge case issues that can bring up)

What do you guys think of instead of jumping between individual functions, we provide all surrounding functions as context?

@josem7 The full code of the surrounding functions? Or just like a summary of the function signatures/similar? And are you defining 'surrounding functions' in terms of literal adjacency within the same code file, or in more of a 'related functions' way (eg. if the function I want is foo(), and that calls bar1()/bar2(), providing all 3 of those to the context)?

I don't have a specific answer here, but I guess my questions/potential concerns would be how much extra 'noise' that might add to the context; and also how that would align to/differ from the current 'scroll through file' approach.

Apr 06 '24 03:04 0xdevalias

Closing this for now, if anyone works on this and has numbers on SWE-bench and it ends up improving performance, we would love to integrate this into main

May 10 '24 20:05 ofirpress

This seems to be aligned to how some other agents have chosen to go, eg.

aider.chat/2023/10/22/repomap.html

Building a better repository map with tree sitter

Where they saw it as an improvement on their older method:

aider.chat/2023/05/25/ctags.html

Improving GPT-4's codebase understanding with ctags

I understand it not being a current priority; but to discount the concept entirely (particularly without reasoning beyond seemingly personal opinion) seems counterintuitive to getting the best agent/outcome here.

Further to this, aider just set a new SOTA and topped the SWE-bench lite leaderboard with 26.3%. While all of that performance gain can't be attributed to just their smart code search/repo map'; I would happily bet that it helped it achieve it:

https://aider.chat/2024/05/22/swe-bench-lite.html
- Aider scored 26.3% on the SWE Bench Lite benchmark, achieving a state-of-the-art result. The current top leaderboard entry is 20.3% from Amazon Q Developer Agent. The best result reported elsewhere seems to be 25% from OpenDevin.
https://github.com/swe-bench/experiments/pull/7
https://www.swebench.com/

It will be interesting to see if they end up exploring stack graphs directly, and if that improves their performance further again:

https://github.com/paul-gauthier/aider/issues/534

May 30 '24 02:05 0xdevalias

SWE-agent SWE-agent copied to clipboard

Explore whether 'smart code search' (AST, stack graphs, scope graphs) would improve performance beyond the current find/grep based approach.

SWE-agent
SWE-agent copied to clipboard