zed icon indicating copy to clipboard operation
zed copied to clipboard

Wait/retry when agent hits rate limits (don't just error)

Open notpeter opened this issue 6 months ago • 3 comments

Summary

When a user encounters a recoverable rate-limit error when bringing their own API Key (Anthropic/OpenAI) instead of erroring, wait. Use hints in the response headers.

Description

For example, the current error message for OpenAI is:

Failed to connect to OpenAI API: Rate limit reached for gpt4.1 in organization org- on tokens per min (TPM): Limit 30000, Used 25292, Requested 10584. Please try again in 11.75s. Visit https://platform.openai.com/account/rate-limits to learn more

Anthropic header is retry-after (seconds: 59) OpenAI is x-ratelimit-reset-requests / x-ratelimit-reset-tokens (1m30s)

Docs:

I have not been able to find reference to response headers in the Google Gemini Rate Limit docs or for Copilot Chat which specify rate limits.

notpeter avatar May 27 '25 16:05 notpeter

This is a HUGE problem! It makes Anthropic (through my own keys) COMPLETELY unusable for me. Even with not-terribly-long prompts and not a lot of contextual files added I can still hit this in a brief conversation using a newer Anthropic model.

brandonzylstra avatar May 31 '25 21:05 brandonzylstra

Anthropic BYO API key users can increase their rate limits by purchasing more credits up front. Context limits are: tier 1: 20k tokens, tier 2: 40k tokens, tier 3: 80k tokens, tier 4: 200k tokens. (these are per-minute, cumulative limits). E.g. if you wish to make three successive 40k token Anthropic API requests in a row you will need to pre-purchase $400 of credit first, otherwise you will be rate limited.

OpenAI has similar rate limits, see your OpenAI organization rate limits for more.

Anthropic OpenAI
Image Image

Note, these limits do not apply for models via Zed Pro, see Zed Pro Pricing for more.

notpeter avatar Jun 02 '25 14:06 notpeter

Workarounds for this:

Anthropic gives you a 20K rate limit at the $20/month subscription rate, and the prices to get higher rate limits directly from Anthropic are very steep, but there are alternatives.

  • GitHub Copilot Pro allows you to use Anthropic models (though not necessarily the latest ones†) with a 90K rate limit. The cost is only $10 for Pro (or $39 for Pro+).
  • Zed Pro appears to have a very high rate limit, which I have not hit yet even a single time, and the cost is only $20/month.

I've found Copilot with 3.7 Sonnet Thinking to be much more productive than the Anthropic API with 4.0 Sonnet Thinking because of the very low rate limit Anthropic imposes.

But going directly to Zed Pro is the best option yet, if you're willing to burn through credits very quickly (and therefore end up with overage charges) without needing to prepay to join a higher tier. You get high rate limits (I've never had an error that exposed to me what that was) AND you get the newest models.

It might be possible that Anthropic will be most economical for some usage patterns—I'm not willing to pay the money to experiment with this. But I can recommend Zed Pro and Copilot as good options to avoid the extreme frustration of always hitting the Anthropic rate limit without a $200/month up-front bill.

The Zed Pro trial is quite generous, so you can experiment freely.


† Actually, it looks like you might be able to get Sonnet 4.0 if you go here https://github.com/settings/copilot/features and enable it manually. I haven't tested this yet.

brandonzylstra avatar Jun 02 '25 18:06 brandonzylstra

This is an issue with Amazon bedrock as well. The system, under high load, will refuse to process requests until it's less busy, responding with HTTP 429.

The claude desktop client, using bedrock, will do a staggered back off retrying up to 10 times, and then allow you request it to continue processing the request if it still fails after the 10th request.

It would be great to see some way to do this in zed.

tkennedy1-godaddy avatar Jun 10 '25 22:06 tkennedy1-godaddy

It'd be great to have some nicer UI/feedback around what's going on, but adding these settings to the AWS profile I have configured for Zed is enough to get things working with Bedrock without having to hit the retry button all the time:

max_attempts = 10
retry_mode  = standard

jayzes avatar Aug 03 '25 21:08 jayzes

It'd be great to have some nicer UI/feedback around what's going on, but adding these settings to the AWS profile I have configured for Zed is enough to get things working with Bedrock without having to hit the retry button all the time:

max_attempts = 10
retry_mode  = standard

@jayzes Can you comment on where and how you have created or edited this AWS profile? I don't see anything in Zed's LLM profiles (Write, Ask, Minimal) where such a setting could be added, so I'm puzzled by your comment.

brandonzylstra avatar Aug 04 '25 18:08 brandonzylstra

Is there any improvement planned for this? I'm working on a very small personal Rails project and getting rate limited on my first or second prompt each day. I've tried with both my Gemini and OpenAI API keys – same issue every time. BYO-API-Key is completely unusable in my experience. This seems to have been an issue for quite a long time. Any hope for this issue getting resolved someday?

codepilotsf avatar Aug 12 '25 19:08 codepilotsf

Hi there! 👋 We're working to clean up our issue tracker by closing older bugs that might not be relevant anymore. If you are able to reproduce this issue in the latest version of Zed, please let us know by commenting on this issue, and it will be kept open. If you can't reproduce it, feel free to close the issue yourself. Otherwise, it will close automatically in 14 days. Thanks for your help!

github-actions[bot] avatar Nov 19 '25 11:11 github-actions[bot]

Here's an update. My PR #37891 for this has been merged to the main branch and is in v0.214.0-pre. There is now some fairly aggressive retrying in the Zed agent to deal with flaky providers.

All providers have a basic level of retry logic, and some of them also have logic to detect and react to Retry-After. I just tested the following providers and confirmed that the Zed agent now handles 429s and observes the Retry-After header:

  • anthropic
  • openai
  • openai_compatible

I believe xAI may have Retry-After working too but I haven't tested it.

Assuming there are no bugs, this may be good enough for the issue to be closed?

cc @bennetbo

timmclean avatar Nov 21 '25 06:11 timmclean