zed
zed copied to clipboard
Wait/retry when agent hits rate limits (don't just error)
Summary
When a user encounters a recoverable rate-limit error when bringing their own API Key (Anthropic/OpenAI) instead of erroring, wait. Use hints in the response headers.
Description
For example, the current error message for OpenAI is:
Failed to connect to OpenAI API: Rate limit reached for gpt4.1 in organization org-
on tokens per min (TPM): Limit 30000, Used 25292, Requested 10584. Please try again in 11.75s. Visit https://platform.openai.com/account/rate-limits to learn more
Anthropic header is retry-after (seconds: 59)
OpenAI is x-ratelimit-reset-requests / x-ratelimit-reset-tokens (1m30s)
Docs:
I have not been able to find reference to response headers in the Google Gemini Rate Limit docs or for Copilot Chat which specify rate limits.
This is a HUGE problem! It makes Anthropic (through my own keys) COMPLETELY unusable for me. Even with not-terribly-long prompts and not a lot of contextual files added I can still hit this in a brief conversation using a newer Anthropic model.
Anthropic BYO API key users can increase their rate limits by purchasing more credits up front. Context limits are: tier 1: 20k tokens, tier 2: 40k tokens, tier 3: 80k tokens, tier 4: 200k tokens. (these are per-minute, cumulative limits). E.g. if you wish to make three successive 40k token Anthropic API requests in a row you will need to pre-purchase $400 of credit first, otherwise you will be rate limited.
OpenAI has similar rate limits, see your OpenAI organization rate limits for more.
| Anthropic | OpenAI |
|---|---|
Note, these limits do not apply for models via Zed Pro, see Zed Pro Pricing for more.
Workarounds for this:
Anthropic gives you a 20K rate limit at the $20/month subscription rate, and the prices to get higher rate limits directly from Anthropic are very steep, but there are alternatives.
- GitHub Copilot Pro allows you to use Anthropic models (though not necessarily the latest ones†) with a 90K rate limit. The cost is only $10 for Pro (or $39 for Pro+).
- Zed Pro appears to have a very high rate limit, which I have not hit yet even a single time, and the cost is only $20/month.
I've found Copilot with 3.7 Sonnet Thinking to be much more productive than the Anthropic API with 4.0 Sonnet Thinking because of the very low rate limit Anthropic imposes.
But going directly to Zed Pro is the best option yet, if you're willing to burn through credits very quickly (and therefore end up with overage charges) without needing to prepay to join a higher tier. You get high rate limits (I've never had an error that exposed to me what that was) AND you get the newest models.
It might be possible that Anthropic will be most economical for some usage patterns—I'm not willing to pay the money to experiment with this. But I can recommend Zed Pro and Copilot as good options to avoid the extreme frustration of always hitting the Anthropic rate limit without a $200/month up-front bill.
The Zed Pro trial is quite generous, so you can experiment freely.
† Actually, it looks like you might be able to get Sonnet 4.0 if you go here https://github.com/settings/copilot/features and enable it manually. I haven't tested this yet.
This is an issue with Amazon bedrock as well. The system, under high load, will refuse to process requests until it's less busy, responding with HTTP 429.
The claude desktop client, using bedrock, will do a staggered back off retrying up to 10 times, and then allow you request it to continue processing the request if it still fails after the 10th request.
It would be great to see some way to do this in zed.
It'd be great to have some nicer UI/feedback around what's going on, but adding these settings to the AWS profile I have configured for Zed is enough to get things working with Bedrock without having to hit the retry button all the time:
max_attempts = 10
retry_mode = standard
It'd be great to have some nicer UI/feedback around what's going on, but adding these settings to the AWS profile I have configured for Zed is enough to get things working with Bedrock without having to hit the retry button all the time:
max_attempts = 10 retry_mode = standard
@jayzes Can you comment on where and how you have created or edited this AWS profile? I don't see anything in Zed's LLM profiles (Write, Ask, Minimal) where such a setting could be added, so I'm puzzled by your comment.
Is there any improvement planned for this? I'm working on a very small personal Rails project and getting rate limited on my first or second prompt each day. I've tried with both my Gemini and OpenAI API keys – same issue every time. BYO-API-Key is completely unusable in my experience. This seems to have been an issue for quite a long time. Any hope for this issue getting resolved someday?
Hi there! 👋 We're working to clean up our issue tracker by closing older bugs that might not be relevant anymore. If you are able to reproduce this issue in the latest version of Zed, please let us know by commenting on this issue, and it will be kept open. If you can't reproduce it, feel free to close the issue yourself. Otherwise, it will close automatically in 14 days. Thanks for your help!
Here's an update. My PR #37891 for this has been merged to the main branch and is in v0.214.0-pre. There is now some fairly aggressive retrying in the Zed agent to deal with flaky providers.
All providers have a basic level of retry logic, and some of them also have logic to detect and react to Retry-After. I just tested the following providers and confirmed that the Zed agent now handles 429s and observes the Retry-After header:
- anthropic
- openai
- openai_compatible
I believe xAI may have Retry-After working too but I haven't tested it.
Assuming there are no bugs, this may be good enough for the issue to be closed?
cc @bennetbo