verifiers icon indicating copy to clipboard operation
verifiers copied to clipboard

Proposal for default fallback for invalid tool call args in `env_response`

Open 13point5 opened this issue 1 month ago • 2 comments

I'm working on an env for training a SWE-grep style model and found that sometimes willcb/Qwen3-4B produces malformed tool inputs with double-encoded strings causing json.loads to produce a str instead of a dict which is the assumption.

This causes an error in update_tool_args because it assumes tool_args is a dict. Here's an example error during my prime-rl run:

Generating rollouts (val):   5%|▌         | 2/40 [01:18<24:53, 39.29s/it][A2025-11-17 23:22:45 - verifiers.envs.SWEGrepEnv - ERROR - Error in rollout: dictionary update sequence element #0 has length 1; 2 is required
2025-11-17 23:22:45 - verifiers.envs.SWEGrepEnv - ERROR - Traceback: Traceback (most recent call last):
  File "/root/agentic-code-search-oss/swe_grep_oss_env.py", line 111, in rollout
    return await super().rollout(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/agentic-code-search-oss/prime-rl/.venv/lib/python3.12/site-packages/verifiers/envs/multiturn_env.py", line 140, in rollout
    env_msgs, state = await maybe_await(
                      ^^^^^^^^^^^^^^^^^^
  File "/root/agentic-code-search-oss/prime-rl/.venv/lib/python3.12/site-packages/verifiers/utils/async_utils.py", line 9, in maybe_await
    return await result
           ^^^^^^^^^^^^
  File "/root/agentic-code-search-oss/swe_grep_oss_env.py", line 59, in env_response
    tool_args = self.update_tool_args(
                ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/agentic-code-search-oss/swe_grep_oss_env.py", line 145, in update_tool_args
    updated_tool_args = dict(tool_args)
                        ^^^^^^^^^^^^^^^
ValueError: dictionary update sequence element #0 has length 1; 2 is required 

The specific tool input that caused this issue is: '"{\\"command\\": \\"rg \\"BinaryField\\" -t py\\"}"'

I propose a default fallback in env_response of tool_env.py like this:

    async def env_response(
        self, messages: Messages, state: State, **kwargs
    ) -> tuple[Messages, State]:
        assert isinstance(messages, list)
        assert "tool_calls" in messages[-1]
        tool_messages = []
        for tool_call in messages[-1]["tool_calls"]:
            tool_name: str = tool_call.get("function", {}).get("name", "")
            tool_call_id: str = tool_call.get("id", "")
            tool_args = json.loads(
                tool_call.get("function", {}).get("arguments", "")
            )

            if isinstance(tool_args, dict):
                tool_message: Message = await self.call_tool(
                    tool_name, tool_args, tool_call_id
                )
            else:
                tool_message = {
                    "role": "tool",
                    "content": f"Error: Invalid tool arguments - {repr(tool_args)}",
                    "tool_call_id": tool_call_id,
                }
            
            tool_messages.append(tool_message)
        return tool_messages, state

Thoughts @willccbb ? I realise that people would want to customize the behavior here like a custom result depending on the tool or custom parsing logic but to avoid a crash of the RL run having a default fallback like this would be good imo.

13point5 avatar Nov 18 '25 04:11 13point5

+1 on this -- making this high prio internally to have more graceful handling of invalid tool call formats

mikasenghaas avatar Nov 27 '25 20:11 mikasenghaas

Are there other situations where the team has found this to cause an issue?

13point5 avatar Nov 27 '25 20:11 13point5