goose Claude Sonnet Models Stop Responding After Initial Message with GitHub Copilot Provider

Summary

When using Claude Sonnet models (claude-3.7-sonnet, claude-sonnet-4) through the GitHub Copilot provider, conversations that involve tool usage stop responding after the initial "Let me start..." message. GPT models work perfectly with the same setup.

Environment

OS: Windows 11 (Enterprise environment)
Provider: GitHub Copilot (only available provider in enterprise setup)
Working Models: gpt-4o, gpt-4o-mini
Broken Models: claude-3.7-sonnet, claude-sonnet-4
Context: Same Claude models work perfectly with tools in other applications (e.g., Emacs with gptel)

Expected Behavior

Claude Sonnet models should continue the conversation and execute tool calls just like GPT models do.

Actual Behavior

User asks a question requiring tool usage
Claude responds with "Let me start [something]..."
Conversation stops completely - no tool calls are executed
No error messages are shown to the user

Root Cause Analysis

After investigating the codebase, the issue is in /crates/goose/src/providers/githubcopilot.rs:

1. Forced Streaming for Claude Models

// Lines 32-33
pub const GITHUB_COPILOT_STREAM_MODELS: &[&str] = 
    &["gpt-4.1", "claude-3.7-sonnet", "claude-sonnet-4"];

// Lines 122-127 - Forces streaming mode for Claude
let stream_only_model = GITHUB_COPILOT_STREAM_MODELS
    .iter()
    .any(|prefix| model_name.starts_with(prefix));
if stream_only_model {
    payload.as_object_mut().unwrap()
        .insert("stream".to_string(), serde_json::Value::Bool(true));
}

2. Silent Error Handling in Stream Parser

// Lines 137-158 - Silently ignores parsing errors
match serde_json::from_str::<OAIStreamChunk>(payload) {
    Ok(ch) => collector.add_chunk(&ch),
    Err(_) => continue,  // ⚠️ SILENTLY IGNORES ERRORS!
}

3. The Problem

GitHub Copilot's Claude streaming format differs slightly from OpenAI's format
When stream parsing fails, errors are silently ignored
Tool calls get lost in the parsing failure
User sees the conversation "stop" with no indication of what went wrong

Why GPT Models Work

GPT models through GitHub Copilot use OpenAI-compatible streaming format, so parsing succeeds.

Why Other Tools Work

Tools like Emacs/gptel likely use non-streaming requests or have different parsing logic that handles GitHub Copilot's Claude format correctly.

Proposed Fix

Add logging to reveal the silent failures:

Err(e) => {
    tracing::warn!("Failed to parse streaming chunk for {}: {} | payload: {}", model_name, e, payload);
    continue;
}

Add fallback to non-streaming mode when streaming fails for Claude models
Consider making Claude models non-streaming by default until GitHub Copilot's Claude streaming format is fully compatible

Reproduction Steps

Set up Goose with GitHub Copilot provider in enterprise environment
Use any Claude Sonnet model (claude-3.7-sonnet or claude-sonnet-4)
Ask a question that requires tool usage (e.g., "What files are in the current directory?")
Observe that the conversation stops after the initial response

Workaround

Use GPT models (gpt-4o, gpt-4o-mini) instead of Claude models when tool usage is required.

Impact

Blocks enterprise users from using Claude models with Goose
Silent failure provides no debugging information
Reduces model choice for users in GitHub Copilot-only environments

Oct 31 '25 18:10 blackgirlbytes

Hi all, it was me who found this problem and posted on Discord. One small correction to the above: my recipes work well with gpt-5 and gpt-5-mini and fail with claude-sonnet-4, with github_copilot provider. Other than that is correct.

Would really appreciate to have it fixed soon, as my workflows work better with Claude models.

Nov 01 '25 13:11 vlebedev

Probably duplicate of #2768

Nov 01 '25 21:11 vlebedev

Prompts like

Ping apple

Ping github and google

Summarize the README

Now rewrite the README to make every line a joke

are all working for me with GitHub Copilot as the provider and claude-sonnet-4 as the model

Any consistent way to reproduce the state where it has trouble initiating tool calling loops?

Nov 04 '25 21:11 alexhancock

it is hard to support malfunctiong gateways (which copilot is, in this case) - so I guess we have to do what others do: for claude models (or for all of copilot) only uses non streaming I think - as it will be functional? that would be my thought @alexhancock

Nov 05 '25 01:11 michaelneale

I am not able to repro this. Posted https://github.com/block/goose/issues/5510#issuecomment-3488027242 as well as some more context in discord here. Going to close it out unless we can identify a specific thing not working with reliable repro steps.

Nov 06 '25 14:11 alexhancock