OpenHands icon indicating copy to clipboard operation
OpenHands copied to clipboard

[GAIA] Add prompt improvement to alleviate solution parsing issue & support Tavily search tools

Open ryanhoangt opened this issue 6 months ago • 1 comments

  • [ ] This change is worth documenting at https://docs.all-hands.dev/
  • [ ] Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

End-user friendly description of the problem this fixes or functionality this introduces.


Summarize what the PR does, explaining any non-trivial design decisions.

Ported from https://github.com/All-Hands-AI/OpenHands/pull/9015/files for baseline eval:

  • Prompt improvement

Eval result: on 2023_all validation set

  • Without Tavily search: 10 correct / 30 instances (33%)
  • With Tavily search: 18 correct / 30 instances (60%)

Link of any specific issues this addresses:

ryanhoangt avatar Jun 11 '25 06:06 ryanhoangt

I'm facing this issue when running eval with Tavily MCP server enabled, it got stuck and errored out due to timeout which doesn't happen when running via the UI. @xingyaoww do you have any idea what might be causing this?

  • Eval log:
0:05:34 - openhands:DEBUG: action_execution_client.py:391 - [runtime 5ccbc3ed-2b7e-4f0b-bdc0-f20c53ad1001-313a930d7bfe0582] adding 2 new stdio servers to MCP config: [MCPStdioServerConfig(name='tavily', command='npx', args=['-y', '[email protected]'], env={'TAVILY_API_KEY': 'tvly-*****'}), MCPStdioServerConfig(name='fetch', command='uvx', args=['mcp-server-fetch'], env={})]
10:05:34 - openhands:DEBUG: action_execution_client.py:410 - [runtime 5ccbc3ed-2b7e-4f0b-bdc0-f20c53ad1001-313a930d7bfe0582] Updating MCP server with 2 new stdio servers (total: 2)
10:05:34 - openhands:DEBUG: action_execution_client.py:432 - [runtime 5ccbc3ed-2b7e-4f0b-bdc0-f20c53ad1001-313a930d7bfe0582] Successfully updated MCP stdio servers, now tracking 2 servers
10:05:34 - openhands:INFO: action_execution_client.py:436 - [runtime 5ccbc3ed-2b7e-4f0b-bdc0-f20c53ad1001-313a930d7bfe0582] Updated MCP config: []
10:05:34 - openhands:DEBUG: utils.py:117 - Creating MCP clients with config: sse_servers=[MCPSSEServerConfig(url='http://localhost:32989/mcp/sse', api_key='******')] stdio_servers=[MCPStdioServerConfig(name='tavily', command='npx', args=['-y', '[email protected]'], env={'TAVILY_API_KEY': 'tvly-*****'})] shttp_servers=[]
10:05:34 - openhands:INFO: utils.py:77 - Initializing MCP agent for url='http://localhost:32989/mcp/sse' api_key='******' with SSE connection...
.
... STUCK HERE AND FAILED DUE TO TIMEOUT -> LOAD 0 TOOLS EVENTUALLY.....
  • Runtime log:
[06/11/25 10:05:34] DEBUG    Inferred transport:               transports.py:889
                             <MCPConfig(config='mcpServers={'f                  
                             etch':                                             
                             StdioMCPServer(command='uvx',                      
                             args=['mcp-server-fetch'],                         
                             env={}, cwd=None,                                  
                             transport='stdio'), 'tavily':                      
                             StdioMCPServer(command='npx',                      
                             args=['-y', '[email protected]'],                   
                             env={'TAVILY_API_KEY':                             
                             'tvly-*****'}, cwd=None,                             
                             transport='stdio')}')>                             
DEBUG:FastMCP.fastmcp.client.transports:Inferred transport: <MCPConfig(config='mcpServers={'fetch': StdioMCPServer(command='uvx', args=['mcp-server-fetch'], env={}, cwd=None, transport='stdio'), 'tavily': StdioMCPServer(command='npx', args=['-y', '[email protected]'], env={'TAVILY_API_KEY': 'tvly-*****}, cwd=None, transport='stdio')}')>
10:05:34 - openhands.runtime.mcp.proxy.manager:INFO: manager.py:73 - FastMCP Proxy initialized successfully
10:05:34 - openhands.runtime.mcp.proxy.manager:INFO: manager.py:104 - Mounted FastMCP Proxy app at /mcp
10:05:34 - openhands:INFO: action_execution_server.py:827 - MCP Proxy Manager updated and remounted successfully
INFO:     172.17.0.1:45068 - "POST /update_mcp_server HTTP/1.1" 200 OK
INFO:     172.17.0.1:41432 - "GET /mcp/sse HTTP/1.1" 200 OK
INFO:     172.17.0.1:41444 - "POST /mcp/messages/?session_id=a72e7098828c45d3a63b6419a86f21a4 HTTP/1.1" 202 Accepted
INFO:     172.17.0.1:41444 - "POST /mcp/messages/?session_id=a72e7098828c45d3a63b6419a86f21a4 HTTP/1.1" 202 Accepted
INFO:     172.17.0.1:41444 - "POST /mcp/messages/?session_id=a72e7098828c45d3a63b6419a86f21a4 HTTP/1.1" 202 Accepted
[06/11/25 10:05:34] DEBUG    Stdio transport connected         transports.py:366
DEBUG:FastMCP.fastmcp.client.transports:Stdio transport connected
Downloading lxml (4.7MiB)
Downloading pydantic-core (1.9MiB)
 Downloading pydantic-core
 Downloading lxml
Installed 35 packages in 22ms
[06/11/25 10:05:37] DEBUG    Stdio transport has               transports.py:344
                             keep_alive=True, not                               
                             disconnecting                                      
DEBUG:FastMCP.fastmcp.client.transports:Stdio transport has keep_alive=True, not disconnecting
                    DEBUG    Stdio transport disconnected      transports.py:374
DEBUG:FastMCP.fastmcp.client.transports:Stdio transport disconnected

ryanhoangt avatar Jun 11 '25 10:06 ryanhoangt