False Rate Limit Error Messages
Describe the bug
Description
Goose is displaying rate limit error messages from the Anthropic API despite normal network traffic and request sizes. The error appears to be a false positive as the network monitoring shows no signs of excessive requests or payload sizes.
To Reproduce
- Normal interaction with Goose
- Error appears in responses despite:
- Normal network traffic (verified in browser dev tools)
- No excessive request sizes
- No actual rate limiting from the API
Expected behavior
- Rate limit errors should only appear when actually hitting API rate limits
- Error messages should accurately reflect the actual state of API interactions
Screenshots
Please provide following information:
- OS & Arch: macOS 14.1.1 (ARM64)
- Interface: UI
- Version: 1.0.12
- Extensions enabled: Developer
- Provider & Model: Anthropic claude-3-5-sonnet-latest
Additional context
Actual Behavior
- Rate limit errors are being shown when network traffic appears normal
- Messages suggest rate limiting but network monitoring shows no signs of excessive requests
Error Message
Failed to generate session description: RateLimitExceeded("Some(Object {"error": Object {"message": String("This request would exceed the rate limit for your organization (fe514f84-91b5-4ee2-b5aa-53e0b5820662) of 8,000 output tokens per minute..."), "type": String("rate_limit_error")}, "type": String("error")}")
Additional Notes
- The error appears in logs at: ~/Library/Application Support/Goose/logs/main.log
- Browser network monitoring shows normal request patterns
- Error persists across multiple interactions
Hey @benvenker those RateLimitExceeded errors come directly from the API, in the anthropic case the logic is:
https://github.com/block/goose/blob/main/crates/goose/src/providers/anthropic.rs#L103-L105
In this specific case, there was a new feature added to call the provider to generate a session description by passing in the conversation so far and that was the API call that was Rate Limited: https://github.com/block/goose/blob/main/crates/goose/src/session/storage.rs#L244
It doesn't look like there's a way to opt-out of that feature at the moment, is that something you'd want to be able to disable?
+1 on disabling or trimming the tokens in some way, it seems to be making anthropic really hard to use with goose as the rate limit hits are non stop once you've got a bit of a conversation going. I'm jsut doing some light website editing and running into
ERROR goose::agents::truncate: Error: Rate limit exceeded: Some(Object {"error": Object {"message": String("This request would exceed the rate limit for your organization (...) of 40,000 input tokens per minute...
with every request
Hit this as well last night. A single request to a fresh session iterated back and forth a little bit, and it hit the rate limit. It was late so I took it as an indicator to go to bed. Coming back in the morning and entering "retry" immediately results in the rate limit error for its first request.
I looked in a little bit deeper and that one request exceeds the 40k token input context length for the model so all future requests and attempts will fail to make any progress.
I also exceeded toke input rate limit. Is it possible that some form of compact command is needed?
Hey @benvenker those
RateLimitExceedederrors come directly from the API, in the anthropic case the logic is: https://github.com/block/goose/blob/main/crates/goose/src/providers/anthropic.rs#L103-L105In this specific case, there was a new feature added to call the provider to generate a session description by passing in the conversation so far and that was the API call that was Rate Limited: https://github.com/block/goose/blob/main/crates/goose/src/session/storage.rs#L244
It doesn't look like there's a way to opt-out of that feature at the moment, is that something you'd want to be able to disable?
I'm having the same issue. From what I'm seeing of the source code you've referenced (with 0 familiarity with the code base apart from this) this message seems almost to be a self fulfilling prophecy once it happens once. I've basically just been starting a new session, giving a prompt to build some context of where things got to, then telling it to go do something substantial. I'm now at 2 user messages, and have a huge amount of data in the log.
My next prompt will be user prompt number 3, which will trigger the conversation summary, and boom, rate limited.
A workaround for now would appear to be making sure to write 4 user prompts for context about the conversation before asking goose to do anything substantial.
A possible fix would be to check the size of the conversation and progressively summarise it if needed, before asking for the 4 word summary based on the summarised version of the conversation. @kalvinnchau does this seem reasonable? If so, seems like a good first issue for me to contribute to goose.
A possible fix would be to check the size of the conversation and progressively summarise it if needed, before asking for the 4 word summary based on the summarised version of the conversation. @kalvinnchau does this seem reasonable? If so, seems like a good first issue for me to contribute to goose.
Hey @bertlebee! This could be an interesting approach, though given that we'd ask the LLM to summarize the conversation there still might be an issue with token usage (since we're passing up the full convo), and rate limits sine you'd add in another API call?
Hey @kalvinnchau, see #1820. my thought was to check if the message exceeds a certain size, if so, split it into chunks and have each of the chunks summarised prior to grouping up the chunks and asking for the keywords. Turns out this wasn't needed though.
After looking at the code more closely, i figured out there's already a working version of the summarisation functionality in a function that's just not used anywhere. Comments indicate that only user messages were ever intended to be sent, and this makes sense since the topic of the conversation is dictated by user.
I used a locally built version of goose all of yesterday with this change and only had issues with rate limits when it attempted to read a large source file generated by wit-bindgen (which I believe is irrelevant for this issue, even if the symptom looks the same)
hi @bertlebee, nice contribution! it looks close - added a comment.
i haven't run into the rate limit before, but does #1820 help fix what you were experiencing? i added a 300 char limit to your change, wondering if you could test and verify that you no longer hit the rate limit