Tool calls hang indefinitely when confirmations arrive out of order
Problem
Tool calls in goosed hang permanently when multiple concurrent requests receive confirmations out of order. Requires manual pod restarts to recover.
Reproduction
Trigger 3-5+ rapid concurrent tool calls in the same session (e.g., rapid Slack mentions tagging a Goose-powered Slackbot). Confirmations may arrive in different order than requests due to network timing, causing hangs.
Root Cause
Code location: crates/goose/src/agents/tool_execution.rs lines 81-126
let mut rx = self.confirmation_rx.lock().await;
while let Some((req_id, confirmation)) = rx.recv().await {
if req_id == request.id {
break; // Found matching confirmation
}
// Bug: Non-matching confirmation is silently discarded
}
When confirmations arrive out of order, non-matching confirmations are discarded instead of being queued. Tool requests waiting for those discarded confirmations hang forever.
Race Condition
1. Request #1 locks confirmation channel, starts waiting
2. Request #2 queued for lock
3. Confirmation #2 arrives first (network timing)
4. Request #1 receives Confirmation #2, discards it (ID mismatch)
5. Request #1 gets its confirmation and completes
6. Request #2 acquires lock but its confirmation was already discarded
7. Request #2 hangs forever
good find. we should have a time out on this either way, looks like if the other side never replies, we're also dead