AgentBench
AgentBench copied to clipboard
Fixed hanging bash commands from agent in os-task
If the agent puts out a command like 'while true; do ls /root; sleep 1; done' it will loop while also putting out an output (meaning the socket doesn't timeout) so I've added a 30s cutoff. Without this it eventually fails but counts as an incomplete task rather than just a fail so it stops any of the overall stats working. With this fix the model will receive any output from the terminal up to this time cutoff.
There are other ways to address this problem (such as raising/catching an error instead of allowing the conversation to continue), please let me know if you think a different approach is better.