OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[Experimental] Integrate repomap

Open ryanhoangt opened this issue 1 year ago • 3 comments

End-user friendly description of the problem this fixes or functionality that this introduces

  • [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

This PR is to:

  • [x] Integrate aider's RepoMap implementation, at max 3072 tokens. I put it into a separate repo here to make it easier to play around, experiment and test more thoroughly.
  • [ ] Run evaluation to see how it affects swe-bench score compared to baseline (@xingyaoww not sure if we can use the nightly evaluation pipeline for this, to try on a small subset, my api key easily reaches token rate limit with 1 worker only so it's a bit slow to do on my side)

When adding repomap, there're 2 phases to execute for every iteration in the agentic loop:

  • first is to get repomap content from the repo, which I'm trying to implement as an agentskill
  • second is to put the content inside the message history, before the environment reminder

I played around with it for a bit and my first impression is the personalization for mentioned files/identifiers still doesn't work really well yet. I'm trying to do a bit of of tuning on the ranking, and perhaps combining with static ranking instead of the built-in pagerank support. If you have some ideas/alternatives regarding this, it'd be great to discuss here. Also I'm not sure if there're any relevant benchmark so we can test various strategies of localization. (Cc: @neubig if you have some ideas and suggestions to share on these)

Some notes:

  • Aider has a concept of "files added to chat", which can help the agent to focus on particular files in a session. These files are weighted more in the pagerank score calculation, while the current integration only parses exact relative filepaths users mention from message history. If there're no files/identifiers (e.g. class names, method names) mentioned, the repomap will likely retrieve irrelevant context.

Link of any specific issues this addresses

Fixes https://github.com/All-Hands-AI/OpenHands/issues/2185

Cc: @enyst if you can have a look and suggest some better ways to do integration.

ryanhoangt avatar Oct 26 '24 15:10 ryanhoangt

PR Reviewer Comments

After analyzing the pull request, here are my findings:

Overall Feedback:

The integration of the RepoMap functionality appears to be a significant enhancement to the project, allowing for better context retrieval based on user interactions. However, there are several areas where the code can be improved for better clarity, performance, and error handling.

Score: 85/100

Labels: Enhancement, Bug fix

Code Suggestions:

  1. File: openhands/agenthub/codeact_agent/codeact_agent.py

    • Suggestion Content: Ensure that the action variable is always initialized before being returned to avoid potential reference errors.
    • Relevant Line: + action = IPythonRunCellAction(...)
    • Existing Code:
      action = IPythonRunCellAction(
          code=get_repomap_code,
          thought='',
          kernel_init_code='from agentskills import *',
      )
      
    • Improved Code:
      action = IPythonRunCellAction(
          code=get_repomap_code,
          thought='',
          kernel_init_code='from agentskills import *',
      ) if action is not None else DefaultAction()
      
  2. File: openhands/runtime/plugins/agent_skills/file_ops/file_ops.py

    • Suggestion Content: Consider adding error handling in the get_repomap function to manage potential issues when accessing the repository or generating the map.
    • Relevant Line: + map_result = REPO_MAP.get_map(message_history=message_history)
    • Existing Code:
      map_result = REPO_MAP.get_map(message_history=message_history)
      
    • Improved Code:
      try:
          map_result = REPO_MAP.get_map(message_history=message_history)
      except Exception as e:
          print(f"Error retrieving RepoMap: {e}")
          return None
      
  3. File: openhands/runtime/plugins/agent_skills/file_ops/file_ops.py

    • Suggestion Content: Use a more descriptive name for the get_repomap function to clarify its purpose.
    • Relevant Line: +def get_repomap(repo_path: str = './', message_history='') -> None:
    • Existing Code:
      def get_repomap(repo_path: str = './', message_history='') -> None:
      
    • Improved Code:
      def retrieve_repo_map(repo_path: str = './', message_history='') -> None:
      

The integration of RepoMap is a valuable enhancement! Here are some suggestions:

  1. Ensure the action variable is always initialized before being returned to avoid potential reference errors.
  2. Add error handling in get_repomap to manage potential issues when accessing the repository.
  3. Consider renaming get_repomap to retrieve_repo_map for clarity.

ngduyanhece avatar Oct 26 '24 16:10 ngduyanhece

This is awesome! I especially like the fact you made it a separate repo - maybe we can simply install it inside runtime and make it available to the agent via Bash 😀

Regarding better ways to use these - we could also introduce some RAG mechanism somewhere, so we can rerank files and display them based on previous user instruction / histories.

I'm trying to get https://github.com/All-Hands-AI/OpenHands/pull/4537 in first (so we have a really strong baseline for a capable LLM). Once that's finished- I'd be happy to help here to run eval ;)

xingyaoww avatar Oct 27 '24 00:10 xingyaoww

Yep sounds good! Thanks for the idea, I'll have a closer look.

ryanhoangt avatar Oct 27 '24 09:10 ryanhoangt

Let's close this one for now, since we will play around with this in the https://github.com/All-Hands-AI/openhands-aci repo.

xingyaoww avatar Nov 09 '24 20:11 xingyaoww

RepoMap is now in openhands-aci, which is included in OH. But does OH actually use/query the RepoMap to maintain existing software? I always see OH running poor man's grep commands to find relevant code. This critical TreeSitter feature seems a little abandoned. Can you please update?

kripper avatar Aug 10 '25 10:08 kripper