PR #2613 broke goose cli developer shell tool calls
I mentioned this in the Discord but am bringing over all the info here.
This commit (which was large and changed a lot) has broken the github copilot provider in goose cli (and maybe other providers). Tools like developer shell commands output gets cutoff and only the first line or two gets sent back to the LLM. See my screenshot for example when trying to do an ls command. The notifications / "loader" or whatever you call it in the CLI you see repeated ("total 400") in numerous ways.
Here is the commit: https://github.com/block/goose/commit/03e5549b5466e754df15c4d5370c7514c628341f Here is the PR: https://github.com/block/goose/pull/2613
Here for example is what it looks like running the ls shell command:
Here is the response from the LLM stating that all that returned was "total 400" instead of the ls command output:
If I run the ls command in my shell you see the full real response starts with "total 400" which is all that is being returned to the LLM after the tool call (with the actual ls results being cutoff):
Since the github copilot provider using models like gpt4.1 requires streaming, I wonder if that was not taken into account or tested for?
Should be easy to repro in main branch with using github copilot provider, gpt 4.1 model, using the goose cli and having the LLM do basically any developer shell command tool call that expects output (ls, cat etc.).
Thank you for creating the issue here on GitHub for the team to look over, @GitMurf !
If this is a github copilot provider specific issue, it is worth checking out this commit (https://github.com/block/goose/commit/4ae5e4264c86be1112a263eaef286640b42aa80d) which implemented streaming for gpt 4.1 and o4-mini models for the github copilot provider since it requires streaming. So maybe the PR did not take into account these requirements for the copilot provider or something?
Hey @GitMurf thanks for raising the issue -- I'm not able to reproduce it on main with copilot + gpt-4.1
goose % ./target/debug/goose run -t "run ls -al and summarize the result"
starting session | provider: github_copilot model: gpt-4.1
logging to /Users/jackamadeo/.local/share/goose/sessions/20250604_055625.jsonl
working directory: /Users/jackamadeo/development/goose
─── shell | developer ──────────────────────────
command: ls -al
⠚ drwxr-xr-x 29 jackamadeo staff 928 Jun 4 05:31 ui-v2
### Summary of `ls -al` Output
- **Total entries:** 42 (including hidden files and directories)
- **Directories:** Includes `.git`, `.github`, `.husky`, `.idea`, `.intersect`, `.pytest_cache`, `bindings`, `crates`, `documentation`, `examples`, `logs`, `scripts`, `target`, `tokenizer_files`, `ui`, and `ui-v2`.
- **Key files:**
- **Configuration/Metadata:** `.env`, `.gitignore`, `Cargo.toml`, `Cross.toml`, `Justfile`, `rust-toolchain.toml`
- **Documentation:** `README.md`, `CONTRIBUTING.md`, `SECURITY.md`, `ACCEPTABLE_USAGE.md`, `ARCHITECTURE.md`, `run_cross_local.md`
- **Shell scripts:** `count.sh`, `download_cli.sh`
- **Other files:** `Cargo.lock`, `hello_world.txt`, `logs.txt`, `msg.json`, `LICENSE`, `recipe.yaml`, `recipe-back.yaml`
- **Special/system files:** `.DS_Store`
- **Permissions:** Most files are readable and writable by the owner; directories are traversable.
- **Recent activity:** Many files modified on Jun 4, indicating recent development.
#### Notable Project Structure Points
- The project contains multiple Rust crates (`crates/`), a desktop UI (`ui/desktop`), and additional UI assets (`ui-v2/`, `ui/`).
- Includes scripts and configuration for Rust and likely for cross-platform support.
- Several directories and files for CI/CD, versioning, caching, and editor configs.
is what I get when I run it in the project root. Similar results with o4-mini.
In your session log, does the toolResponse show only the "total 400" line?
In your session log, does the toolResponse show only the "total 400" line?
@jamadeo I will run through a test this afternoon and check out the logs to get a more exact look at what is being sent and received.
@jamadeo I noticed you are running from target/debug/...? I am doing a cargo build --release. Could this somehow be part of the problem?
btw @jamadeo here is my session info running the same goose run -t "run ls -al and summarize the result" as you:
{"working_dir":"/home/archXXXXXXX","description":"List directory contents summary","schedule_id":null,"message_count":4,"total_tokens":null,"input_tokens":null,"output_tokens":null,"accumulated_total_tokens":null,"accumulated_input_tokens":null,"accumulated_output_tokens":null}
{"role":"user","created":1748984571,"content":[{"type":"text","text":"run ls -al and summarize the result"}]}
{"role":"assistant","created":1748984574,"content":[{"type":"toolRequest","id":"call_13zyRl3gi8lVyvz14aYk5AHI","toolCall":{"status":"success","value":{"name":"developer__shell","arguments":{"command":"ls -al"}}}}]}
{"role":"user","created":1748984574,"content":[{"type":"toolResponse","id":"call_13zyRl3gi8lVyvz14aYk5AHI","toolResult":{"status":"success","value":[{"type":"text","text":"total 400\n","annotations":{"audience":["assistant"]}},{"type":"text","text":"total 400\n","annotations":{"audience":["user"],"priority":0.0}}]}}]}
{"role":"assistant","created":1748984581,"content":[{"type":"text","text":"The ls -al command returned a summary line indicating the total number of blocks (400) in the current directory, but no details about individual files or folders were included in the output.\n\nWould you like me to try running it again or investigate the contents of this directory further?"}]}
And just in case, my environment is Windows 11 running goose in Windows Terminal WSL2 arch linux using zsh.
@jamadeo I noticed you are running from target/debug/...? I am doing a
cargo build --release. Could this somehow be part of the problem?
Ok, tested it with a debug build and same result. So we can eliminate that from the list:
Here is the session for this one:
{"working_dir":"/home/archXXXXXX","description":"Listing directory contents","schedule_id":null,"message_count":4,"total_tokens":null,"input_tokens":null,"output_tokens":null,"accumulated_total_tokens":null,"accumulated_input_tokens":null,"accumulated_output_tokens":null}
{"role":"user","created":1748987446,"content":[{"type":"text","text":"run the ls -al command to show me files and dirs"}]}
{"role":"assistant","created":1748987449,"content":[{"type":"toolRequest","id":"call_FB33C5F3c89Rh8fj7VLKp6N0","toolCall":{"status":"success","value":{"name":"developer__shell","arguments":{"command":"ls -al"}}}}]}
{"role":"user","created":1748987449,"content":[{"type":"toolResponse","id":"call_FB33C5F3c89Rh8fj7VLKp6N0","toolResult":{"status":"success","value":[{"type":"text","text":"total 392\n","annotations":{"audience":["assistant"]}},{"type":"text","text":"total 392\n","annotations":{"audience":["user"],"priority":0.0}}]}}]}
{"role":"assistant","created":1748987455,"content":[{"type":"text","text":"Here is the result of running ls -al in your current directory:\n\n```\ntotal 392\n```\n\nThis output indicates the total block size in the directory, but it seems the detailed files and directories list was not included. Would you like me to try again to show the full listing of files and directories in the current folder?"}]}
@jamadeo I suspect you are in full auto mode. Can you turn on approval mode so that it asks you to approve the shell command?
@GitMurf aha, I have an idea of what might be happening. I don't think it's related to the approval mode, I think it's because we're closing the output stream too soon in the shell tool. For some reason I cannot reproduce this at all on my mac, but I could see how the updated shell output handling might be broken.
Are you able to test out with this possible fix? https://github.com/block/goose/pull/2771
@jamadeo I will give it a try and let you know. This sounds precisely like what could be happening as sometimes I get no output returned and other times a couple lines. I was going to mention that it felt like a timing issue / race condition of some sort so this seems like the right path. I'll let you know when I'm able to test.
@jamadeo your fix seems to have worked! 👍
Although there is one thing that is still "off" a bit and I want to make sure it is not another timing issue or sign of potential race condition / timing issue. When a tool call runs, there is this new "spinner" animation it seems. It seems to duplicate whatever the last line from the tool call output is and has the spinner prepended to the start of the new duplicated line. I tried to show it with the arrow here. Notice the two red boxes show the duplicated output line.
Here is another example of it (easier to see the green spinner animation thing in this one):
Is this an expected quirk/glitch in the terminal animation? If not, probably worth looking into since it "seems" related to the timing stuff that you previously fixed for the shell command output.
Maybe it is now protected and "deterministic" with regards to waiting and finishing the tool call stuff, but it still feels very "quick" and fast like that it is just barely finishing in time to get the output finished for the respond back to the LLM. Maybe this is just the nature of the feature / changes and expected, but wanted to let you know as in the past it felt pretty clear and deliberate and a "pause" between tool call outputs and LLM responses, and now it is very fast back and forth which is great if all is properly being awaited and gathered, but I just want to make sure we aren't still risking a potential "fragile" timing issue here.
Thanks for the quick fix! Let me know if you need me to test anything else or have any questions for me.
@jamadeo I tested your commit (https://github.com/block/goose/pull/2771/commits/0933598cd38c7f506984431f98cc8ed669cf16b9) you added on fixing / hiding the spinner and it seems to all be working nicely now FYI!
@GitMurf thank you for testing out the fix!