OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

integrate LocAgent into OpenHands

Open czlll opened this issue 9 months ago • 7 comments

  • [ ] This change is worth documenting at https://docs.all-hands.dev/
  • [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality that this introduces.


Give a summary of what the PR does, explaining any non-trivial design decisions. I'm integrating LocAgent into OpenHands!

  • build a graph-based repository indexing
  • introduce unified tools for agent-based code exploration that leverage this graph index, allowing LLM agents to perform complex multi-hop navigation and reasoning across code dependencies

Link of any specific issues this addresses. To run this PR locally, use the following command:

./evaluation/benchmarks/swe_bench/scripts/run_localize.sh llm.eval_gpt4o HEAD LocAgent 1 30 1

czlll avatar Mar 20 '25 00:03 czlll

@ryanhoangt could you take a look at this when you have time?

xingyaoww avatar Mar 20 '25 14:03 xingyaoww

Hi @czlll, thanks for the contribution! I took a look at the PR and tried to run localization on SWE-bench and the code seems to work. I made a PR into your branch here to do some refactors, could you take a look to see if it looks good? Also when running run_localize.sh, I noticed the agent seemed to face difficulties using the search_code_snippets tool, and it had to switch to the 2 other tools. Do you have any ideas? Below is the error I got:

09:10:08 - openhands:DEBUG: agent_controller.py:781
ACTION
[Agent Controller 1a5a6f6f-ff5f-4bab-8003-273ec34a9d75-5eb62c31a6dfb82f] **IPythonRunCellAction**
THOUGHT: Let me try a different approach to search for these validators.
CODE:
print(search_code_snippets(**{'search_terms': ['validators.py'], 'file_path_or_pattern': '**/validators.py'}))
09:10:10 - openhands:DEBUG: stream.py:273 - Adding IPythonRunCellObservation id=7 from AGENT
09:10:10 - openhands:DEBUG: agent_controller.py:396
OBSERVATION
[Agent Controller 1a5a6f6f-ff5f-4bab-8003-273ec34a9d75-5eb62c31a6dfb82f] **IPythonRunCellObservation**
DEBUG:bm25s:Building index from IDs objects
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 1
----> 1 print(search_code_snippets(**{'search_terms': ['validators.py'], 'file_path_or_pattern': '**/validators.py'}))

File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/tools.py:592, in search_code_snippets(search_terms, line_nums, file_path_or_pattern)
    590 # search content
    591 if continue_search:
--> 592     query_results = bm25_content_retrieve(
    593         query_info=query_info, include_files=include_files
    594     )
    595     cur_query_results.extend(query_results)
    597 elif i != (len(filter_terms) - 1):

File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/tools.py:814, in bm25_content_retrieve(query_info, include_files, similarity_top_k)
    809 else:
    810     # repo_path = get_repo_save_dir()
    811     # repo_dir = setup_repo(instance_data=instance, repo_base_dir=repo_playground,
    812     # dataset=None, split=None)
    813     absolute_repo_dir = os.path.abspath(REPO_PATH)
--> 814     retriever = build_code_retriever(
    815         absolute_repo_dir,
    816         persist_path=BM25_INDEX_DIR,
    817         similarity_top_k=similarity_top_k,
    818     )
    820 # similarity: {score}
    821 cur_query_results = []

File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/repo/chunk_index/code_retriever.py:88, in build_code_retriever_from_repo(repo_path, similarity_top_k, min_chunk_size, chunk_size, max_chunk_size, hard_token_limit, max_chunks, persist_path, show_progress)
     83 prepared_nodes = splitter.get_nodes_from_documents(
     84     docs, show_progress=show_progress
     85 )
     87 # We can pass in the index, docstore, or list of nodes to create the retriever
---> 88 retriever = BM25Retriever.from_defaults(
     89     nodes=prepared_nodes,
     90     similarity_top_k=similarity_top_k,
     91     stemmer=Stemmer.Stemmer('english'),
     92     language='english',
     93 )
     94 if persist_path:
     95     retriever.persist(persist_path)

File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/llama_index/retrievers/bm25/base.py:130, in BM25Retriever.from_defaults(cls, index, nodes, docstore, stemmer, language, similarity_top_k, verbose, skip_stemming, token_pattern, tokenizer)
    128 # ensure only one of index, nodes, or docstore is passed
    129 if sum(bool(val) for val in [index, nodes, docstore]) != 1:
--> 130     raise ValueError("Please pass exactly one of index, nodes, or docstore.")
    132 if index is not None:
    133     docstore = index.docstore

ValueError: Please pass exactly one of index, nodes, or docstore.
[Jupyter current working directory: /workspace/django__django__3.0]
[Jupyter Python interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]

ryanhoangt avatar Apr 10 '25 09:04 ryanhoangt

Thank you so much for the work on this, it's a very interesting PR! For the record, the original repo is here: https://github.com/gersteinlab/LocAgent

Paper: LocAgent: Graph-Guided LLM Agents for Code Localization

enyst avatar Apr 10 '25 10:04 enyst

Hi @czlll, thanks for the contribution! I took a look at the PR and tried to run localization on SWE-bench and the code seems to work. I made a PR into your branch here to do some refactors, could you take a look to see if it looks good? Also when running run_localize.sh, I noticed the agent seemed to face difficulties using the search_code_snippets tool, and it had to switch to the 2 other tools. Do you have any ideas? Below is the error I got:

09:10:08 - openhands:DEBUG: agent_controller.py:781
ACTION
[Agent Controller 1a5a6f6f-ff5f-4bab-8003-273ec34a9d75-5eb62c31a6dfb82f] **IPythonRunCellAction**
THOUGHT: Let me try a different approach to search for these validators.
CODE:
print(search_code_snippets(**{'search_terms': ['validators.py'], 'file_path_or_pattern': '**/validators.py'}))
09:10:10 - openhands:DEBUG: stream.py:273 - Adding IPythonRunCellObservation id=7 from AGENT
09:10:10 - openhands:DEBUG: agent_controller.py:396
OBSERVATION
[Agent Controller 1a5a6f6f-ff5f-4bab-8003-273ec34a9d75-5eb62c31a6dfb82f] **IPythonRunCellObservation**
DEBUG:bm25s:Building index from IDs objects
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[1], line 1
----> 1 print(search_code_snippets(**{'search_terms': ['validators.py'], 'file_path_or_pattern': '**/validators.py'}))

File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/tools.py:592, in search_code_snippets(search_terms, line_nums, file_path_or_pattern)
    590 # search content
    591 if continue_search:
--> 592     query_results = bm25_content_retrieve(
    593         query_info=query_info, include_files=include_files
    594     )
    595     cur_query_results.extend(query_results)
    597 elif i != (len(filter_terms) - 1):

File /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/lib/python3.12/site-packages/openhands_aci/indexing/locagent/tools.py:814, in bm25_content_retrieve(query_info, include_files, similarity_top_k)
    809 else:
    810     # repo_path = get_repo_save_dir()
    811     # repo_dir = setup_repo(instance_data=instance, repo_base_dir=repo_playground,
    812     # dataset=None, split=None)
    813     absolute_repo_dir = os.path.abspath(REPO_PATH)
--> 814     retriever = build_code_retriever(
    815         absolute_repo_dir,
    816         persist_path=BM25_INDEX_DIR,
    817         similarity_top_k=similarity_top_k,
    818     )
    820 # similarity: {score}
    821 cur_query_results = []

Hi @ryanhoangt, thank you for the refactors! I’ll take a look at your changes shortly.

Also, thanks for pointing out the issue. It seems to be caused by an incorrect REPO_PATH or absolute_repo_dir when calling build_code_retriever, which results in the BM25 index not being built properly for the repository.

I’ve started looking into it and will try to reproduce and fix it in the next couple of days.

czlll avatar Apr 10 '25 17:04 czlll

This PR is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar May 11 '25 02:05 github-actions[bot]

Sorry for the long delay!

I believe the issue you mentioned has now been resolved. The root cause was a compatibility problem due to mismatched versions of the tree-sitter Python package. I've opened a PR in openhands-aci to address this.

Additionally, to avoid potential timeout issues when generating the index for large repositories in real-time, you can export INDEX_BASE_DIR in your script and preload the index into the sandbox. Pre-generated indexes for swe-bench-lite and loc-bench are available here: index_data

export INDEX_BASE_DIR="LOCAL_INDEX_DIR"

./evaluation/benchmarks/swe_bench/scripts/run_localize.sh llm.eval_gpt4o HEAD LocAgent 1 30 1

czlll avatar May 19 '25 07:05 czlll

Hey @czlll, thanks for looking into it. I just do a small run and it seems to work great now. I'll try to do some final cleanup and get the PR merged.

ryanhoangt avatar May 20 '25 14:05 ryanhoangt

Hi guys, what a great PR 🚀 ! I have a question: How do we use the LocAgent? OpenHands will automatically use it when it's possible, or do we need to configure something else? @czlll @ryanhoangt @xingyaoww

KennyDizi avatar Jun 02 '25 00:06 KennyDizi