Proposal for default fallback for invalid tool call args in `env_response`
I'm working on an env for training a SWE-grep style model and found that sometimes willcb/Qwen3-4B produces malformed tool inputs with double-encoded strings causing json.loads to produce a str instead of a dict which is the assumption.
This causes an error in update_tool_args because it assumes tool_args is a dict. Here's an example error during my prime-rl run:
Generating rollouts (val): 5%|▌ | 2/40 [01:18<24:53, 39.29s/it][A2025-11-17 23:22:45 - verifiers.envs.SWEGrepEnv - ERROR - Error in rollout: dictionary update sequence element #0 has length 1; 2 is required
2025-11-17 23:22:45 - verifiers.envs.SWEGrepEnv - ERROR - Traceback: Traceback (most recent call last):
File "/root/agentic-code-search-oss/swe_grep_oss_env.py", line 111, in rollout
return await super().rollout(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/agentic-code-search-oss/prime-rl/.venv/lib/python3.12/site-packages/verifiers/envs/multiturn_env.py", line 140, in rollout
env_msgs, state = await maybe_await(
^^^^^^^^^^^^^^^^^^
File "/root/agentic-code-search-oss/prime-rl/.venv/lib/python3.12/site-packages/verifiers/utils/async_utils.py", line 9, in maybe_await
return await result
^^^^^^^^^^^^
File "/root/agentic-code-search-oss/swe_grep_oss_env.py", line 59, in env_response
tool_args = self.update_tool_args(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/agentic-code-search-oss/swe_grep_oss_env.py", line 145, in update_tool_args
updated_tool_args = dict(tool_args)
^^^^^^^^^^^^^^^
ValueError: dictionary update sequence element #0 has length 1; 2 is required
The specific tool input that caused this issue is: '"{\\"command\\": \\"rg \\"BinaryField\\" -t py\\"}"'
I propose a default fallback in env_response of tool_env.py like this:
async def env_response(
self, messages: Messages, state: State, **kwargs
) -> tuple[Messages, State]:
assert isinstance(messages, list)
assert "tool_calls" in messages[-1]
tool_messages = []
for tool_call in messages[-1]["tool_calls"]:
tool_name: str = tool_call.get("function", {}).get("name", "")
tool_call_id: str = tool_call.get("id", "")
tool_args = json.loads(
tool_call.get("function", {}).get("arguments", "")
)
if isinstance(tool_args, dict):
tool_message: Message = await self.call_tool(
tool_name, tool_args, tool_call_id
)
else:
tool_message = {
"role": "tool",
"content": f"Error: Invalid tool arguments - {repr(tool_args)}",
"tool_call_id": tool_call_id,
}
tool_messages.append(tool_message)
return tool_messages, state
Thoughts @willccbb ? I realise that people would want to customize the behavior here like a custom result depending on the tool or custom parsing logic but to avoid a crash of the RL run having a default fallback like this would be good imo.
+1 on this -- making this high prio internally to have more graceful handling of invalid tool call formats
Are there other situations where the team has found this to cause an issue?