integrate LocAgent into OpenHands
- [ ] This change is worth documenting at https://docs.all-hands.dev/
- [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below
End-user friendly description of the problem this fixes or functionality that this introduces.
Give a summary of what the PR does, explaining any non-trivial design decisions. I'm integrating LocAgent into OpenHands!
- build a graph-based repository indexing
- introduce unified tools for agent-based code exploration that leverage this graph index, allowing LLM agents to perform complex multi-hop navigation and reasoning across code dependencies
Link of any specific issues this addresses. To run this PR locally, use the following command:
./evaluation/benchmarks/swe_bench/scripts/run_localize.sh llm.eval_gpt4o HEAD LocAgent 1 30 1
@ryanhoangt could you take a look at this when you have time?
Hi @czlll, thanks for the contribution! I took a look at the PR and tried to run localization on SWE-bench and the code seems to work. I made a PR into your branch here to do some refactors, could you take a look to see if it looks good? Also when running run_localize.sh, I noticed the agent seemed to face difficulties using the search_code_snippets tool, and it had to switch to the 2 other tools. Do you have any ideas? Below is the error I got:
09:10:08 - openhands:DEBUG: agent_controller.py:781
ACTION
[Agent Controller 1a5a6f6f-ff5f-4bab-8003-273ec34a9d75-5eb62c31a6dfb82f] **IPythonRunCellAction**
THOUGHT: Let me try a different approach to search for these validators.
CODE:
print(search_code_snippets(**{'search_terms': ['validators.py'], 'file_path_or_pattern': '**/validators.py'}))
09:10:10 - openhands:DEBUG: stream.py:273 - Adding IPythonRunCellObservation id=7 from AGENT
09:10:10 - openhands:DEBUG: agent_controller.py:396
OBSERVATION
[Agent Controller 1a5a6f6f-ff5f-4bab-8003-273ec34a9d75-5eb62c31a6dfb82f] **IPythonRunCellObservation**
DEBUG:bm25s:Building index from IDs objects
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 1
----> 1 print(search_code_snippets(**{'search_terms': ['validators.py'], 'file_path_or_pattern': '**/validators.py'}))
File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/tools.py:592, in search_code_snippets(search_terms, line_nums, file_path_or_pattern)
590 # search content
591 if continue_search:
--> 592 query_results = bm25_content_retrieve(
593 query_info=query_info, include_files=include_files
594 )
595 cur_query_results.extend(query_results)
597 elif i != (len(filter_terms) - 1):
File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/tools.py:814, in bm25_content_retrieve(query_info, include_files, similarity_top_k)
809 else:
810 # repo_path = get_repo_save_dir()
811 # repo_dir = setup_repo(instance_data=instance, repo_base_dir=repo_playground,
812 # dataset=None, split=None)
813 absolute_repo_dir = os.path.abspath(REPO_PATH)
--> 814 retriever = build_code_retriever(
815 absolute_repo_dir,
816 persist_path=BM25_INDEX_DIR,
817 similarity_top_k=similarity_top_k,
818 )
820 # similarity: {score}
821 cur_query_results = []
File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/repo/chunk_index/code_retriever.py:88, in build_code_retriever_from_repo(repo_path, similarity_top_k, min_chunk_size, chunk_size, max_chunk_size, hard_token_limit, max_chunks, persist_path, show_progress)
83 prepared_nodes = splitter.get_nodes_from_documents(
84 docs, show_progress=show_progress
85 )
87 # We can pass in the index, docstore, or list of nodes to create the retriever
---> 88 retriever = BM25Retriever.from_defaults(
89 nodes=prepared_nodes,
90 similarity_top_k=similarity_top_k,
91 stemmer=Stemmer.Stemmer('english'),
92 language='english',
93 )
94 if persist_path:
95 retriever.persist(persist_path)
File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/llama_index/retrievers/bm25/base.py:130, in BM25Retriever.from_defaults(cls, index, nodes, docstore, stemmer, language, similarity_top_k, verbose, skip_stemming, token_pattern, tokenizer)
128 # ensure only one of index, nodes, or docstore is passed
129 if sum(bool(val) for val in [index, nodes, docstore]) != 1:
--> 130 raise ValueError("Please pass exactly one of index, nodes, or docstore.")
132 if index is not None:
133 docstore = index.docstore
ValueError: Please pass exactly one of index, nodes, or docstore.
[Jupyter current working directory: /workspace/django__django__3.0]
[Jupyter Python interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
Thank you so much for the work on this, it's a very interesting PR! For the record, the original repo is here: https://github.com/gersteinlab/LocAgent
Paper: LocAgent: Graph-Guided LLM Agents for Code Localization
Hi @czlll, thanks for the contribution! I took a look at the PR and tried to run localization on SWE-bench and the code seems to work. I made a PR into your branch here to do some refactors, could you take a look to see if it looks good? Also when running
run_localize.sh, I noticed the agent seemed to face difficulties using thesearch_code_snippetstool, and it had to switch to the 2 other tools. Do you have any ideas? Below is the error I got:09:10:08 - openhands:DEBUG: agent_controller.py:781 ACTION [Agent Controller 1a5a6f6f-ff5f-4bab-8003-273ec34a9d75-5eb62c31a6dfb82f] **IPythonRunCellAction** THOUGHT: Let me try a different approach to search for these validators. CODE: print(search_code_snippets(**{'search_terms': ['validators.py'], 'file_path_or_pattern': '**/validators.py'})) 09:10:10 - openhands:DEBUG: stream.py:273 - Adding IPythonRunCellObservation id=7 from AGENT 09:10:10 - openhands:DEBUG: agent_controller.py:396 OBSERVATION [Agent Controller 1a5a6f6f-ff5f-4bab-8003-273ec34a9d75-5eb62c31a6dfb82f] **IPythonRunCellObservation** DEBUG:bm25s:Building index from IDs objects --------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[1], line 1 ----> 1 print(search_code_snippets(**{'search_terms': ['validators.py'], 'file_path_or_pattern': '**/validators.py'})) File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/tools.py:592, in search_code_snippets(search_terms, line_nums, file_path_or_pattern) 590 # search content 591 if continue_search: --> 592 query_results = bm25_content_retrieve( 593 query_info=query_info, include_files=include_files 594 ) 595 cur_query_results.extend(query_results) 597 elif i != (len(filter_terms) - 1): File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/tools.py:814, in bm25_content_retrieve(query_info, include_files, similarity_top_k) 809 else: 810 # repo_path = get_repo_save_dir() 811 # repo_dir = setup_repo(instance_data=instance, repo_base_dir=repo_playground, 812 # dataset=None, split=None) 813 absolute_repo_dir = os.path.abspath(REPO_PATH) --> 814 retriever = build_code_retriever( 815 absolute_repo_dir, 816 persist_path=BM25_INDEX_DIR, 817 similarity_top_k=similarity_top_k, 818 ) 820 # similarity: {score} 821 cur_query_results = []
Hi @ryanhoangt, thank you for the refactors! I’ll take a look at your changes shortly.
Also, thanks for pointing out the issue. It seems to be caused by an incorrect REPO_PATH or absolute_repo_dir when calling build_code_retriever, which results in the BM25 index not being built properly for the repository.
I’ve started looking into it and will try to reproduce and fix it in the next couple of days.
This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
Sorry for the long delay!
I believe the issue you mentioned has now been resolved. The root cause was a compatibility problem due to mismatched versions of the tree-sitter Python package. I've opened a PR in openhands-aci to address this.
Additionally, to avoid potential timeout issues when generating the index for large repositories in real-time, you can export INDEX_BASE_DIR in your script and preload the index into the sandbox. Pre-generated indexes for swe-bench-lite and loc-bench are available here: index_data
export INDEX_BASE_DIR="LOCAL_INDEX_DIR"
./evaluation/benchmarks/swe_bench/scripts/run_localize.sh llm.eval_gpt4o HEAD LocAgent 1 30 1
Hey @czlll, thanks for looking into it. I just do a small run and it seems to work great now. I'll try to do some final cleanup and get the PR merged.
Hi guys, what a great PR 🚀 ! I have a question: How do we use the LocAgent? OpenHands will automatically use it when it's possible, or do we need to configure something else?
@czlll @ryanhoangt @xingyaoww