goose icon indicating copy to clipboard operation
goose copied to clipboard

False Rate Limit Error Messages

Open benvenker opened this issue 9 months ago • 4 comments

Describe the bug

Description

Goose is displaying rate limit error messages from the Anthropic API despite normal network traffic and request sizes. The error appears to be a false positive as the network monitoring shows no signs of excessive requests or payload sizes.

To Reproduce

  1. Normal interaction with Goose
  2. Error appears in responses despite:
    • Normal network traffic (verified in browser dev tools)
    • No excessive request sizes
    • No actual rate limiting from the API

Expected behavior

  • Rate limit errors should only appear when actually hitting API rate limits
  • Error messages should accurately reflect the actual state of API interactions

Screenshots

Image

Please provide following information:

  • OS & Arch: macOS 14.1.1 (ARM64)
  • Interface: UI
  • Version: 1.0.12
  • Extensions enabled: Developer
  • Provider & Model: Anthropic claude-3-5-sonnet-latest

Additional context

Actual Behavior

  • Rate limit errors are being shown when network traffic appears normal
  • Messages suggest rate limiting but network monitoring shows no signs of excessive requests

Error Message

Failed to generate session description: RateLimitExceeded("Some(Object {"error": Object {"message": String("This request would exceed the rate limit for your organization (fe514f84-91b5-4ee2-b5aa-53e0b5820662) of 8,000 output tokens per minute..."), "type": String("rate_limit_error")}, "type": String("error")}")

Additional Notes

  • The error appears in logs at: ~/Library/Application Support/Goose/logs/main.log
  • Browser network monitoring shows normal request patterns
  • Error persists across multiple interactions

benvenker avatar Mar 05 '25 05:03 benvenker

Hey @benvenker those RateLimitExceeded errors come directly from the API, in the anthropic case the logic is: https://github.com/block/goose/blob/main/crates/goose/src/providers/anthropic.rs#L103-L105

In this specific case, there was a new feature added to call the provider to generate a session description by passing in the conversation so far and that was the API call that was Rate Limited: https://github.com/block/goose/blob/main/crates/goose/src/session/storage.rs#L244

It doesn't look like there's a way to opt-out of that feature at the moment, is that something you'd want to be able to disable?

kalvinnchau avatar Mar 05 '25 17:03 kalvinnchau

+1 on disabling or trimming the tokens in some way, it seems to be making anthropic really hard to use with goose as the rate limit hits are non stop once you've got a bit of a conversation going. I'm jsut doing some light website editing and running into

ERROR goose::agents::truncate: Error: Rate limit exceeded: Some(Object {"error": Object {"message": String("This request would exceed the rate limit for your organization (...) of 40,000 input tokens per minute...

with every request

bigethan avatar Mar 09 '25 00:03 bigethan

Hit this as well last night. A single request to a fresh session iterated back and forth a little bit, and it hit the rate limit. It was late so I took it as an indicator to go to bed. Coming back in the morning and entering "retry" immediately results in the rate limit error for its first request.

I looked in a little bit deeper and that one request exceeds the 40k token input context length for the model so all future requests and attempts will fail to make any progress.

sstelfox avatar Mar 14 '25 13:03 sstelfox

I also exceeded toke input rate limit. Is it possible that some form of compact command is needed?

Straffern avatar Mar 17 '25 12:03 Straffern

Hey @benvenker those RateLimitExceeded errors come directly from the API, in the anthropic case the logic is: https://github.com/block/goose/blob/main/crates/goose/src/providers/anthropic.rs#L103-L105

In this specific case, there was a new feature added to call the provider to generate a session description by passing in the conversation so far and that was the API call that was Rate Limited: https://github.com/block/goose/blob/main/crates/goose/src/session/storage.rs#L244

It doesn't look like there's a way to opt-out of that feature at the moment, is that something you'd want to be able to disable?

I'm having the same issue. From what I'm seeing of the source code you've referenced (with 0 familiarity with the code base apart from this) this message seems almost to be a self fulfilling prophecy once it happens once. I've basically just been starting a new session, giving a prompt to build some context of where things got to, then telling it to go do something substantial. I'm now at 2 user messages, and have a huge amount of data in the log.

My next prompt will be user prompt number 3, which will trigger the conversation summary, and boom, rate limited.

A workaround for now would appear to be making sure to write 4 user prompts for context about the conversation before asking goose to do anything substantial.

A possible fix would be to check the size of the conversation and progressively summarise it if needed, before asking for the 4 word summary based on the summarised version of the conversation. @kalvinnchau does this seem reasonable? If so, seems like a good first issue for me to contribute to goose.

bertlebee avatar Mar 23 '25 20:03 bertlebee

A possible fix would be to check the size of the conversation and progressively summarise it if needed, before asking for the 4 word summary based on the summarised version of the conversation. @kalvinnchau does this seem reasonable? If so, seems like a good first issue for me to contribute to goose.

Hey @bertlebee! This could be an interesting approach, though given that we'd ask the LLM to summarize the conversation there still might be an issue with token usage (since we're passing up the full convo), and rate limits sine you'd add in another API call?

kalvinnchau avatar Mar 24 '25 21:03 kalvinnchau

Hey @kalvinnchau, see #1820. my thought was to check if the message exceeds a certain size, if so, split it into chunks and have each of the chunks summarised prior to grouping up the chunks and asking for the keywords. Turns out this wasn't needed though.

After looking at the code more closely, i figured out there's already a working version of the summarisation functionality in a function that's just not used anywhere. Comments indicate that only user messages were ever intended to be sent, and this makes sense since the topic of the conversation is dictated by user.

I used a locally built version of goose all of yesterday with this change and only had issues with rate limits when it attempted to read a large source file generated by wit-bindgen (which I believe is irrelevant for this issue, even if the symptom looks the same)

bertlebee avatar Mar 25 '25 00:03 bertlebee

hi @bertlebee, nice contribution! it looks close - added a comment.

wendytang avatar Mar 25 '25 23:03 wendytang

i haven't run into the rate limit before, but does #1820 help fix what you were experiencing? i added a 300 char limit to your change, wondering if you could test and verify that you no longer hit the rate limit

wendytang avatar Mar 25 '25 23:03 wendytang