AgentBench Fixed hanging bash commands from agent in os-task

Fixed hanging bash commands from agent in os-task

Open rjmoss opened this issue 6 months ago • 0 comments

If the agent puts out a command like 'while true; do ls /root; sleep 1; done' it will loop while also putting out an output (meaning the socket doesn't timeout) so I've added a 30s cutoff. Without this it eventually fails but counts as an incomplete task rather than just a fail so it stops any of the overall stats working. With this fix the model will receive any output from the terminal up to this time cutoff.

There are other ways to address this problem (such as raising/catching an error instead of allowing the conversation to continue), please let me know if you think a different approach is better.

Aug 11 '24 11:08 rjmoss

AgentBench AgentBench copied to clipboard

Fixed hanging bash commands from agent in os-task

AgentBench
AgentBench copied to clipboard