Rate Limit
Which API Provider are you using?
Anthropic
Which Model are you using?
Claude 3.5 Sonnet)
What happened?
429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}
Steps to reproduce
1.Having cline rewrite the code it seems that 600 lines of code is all it can handle then it mangles the code making it unusable 2. Seems like paying for a service that can only handle 600 lines of code is a waste of money. 3. How to get around this limit so I can use the tool that I pay for ? 4.
Relevant API REQUEST output
429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}
Additional context
429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}
This issue seems to be related to the amount of context being sent to the API every time. The further you get in a conversation the more context there is, and by default Anthropic imposes a limit of 40,000 tokens up (though it seems like you have 80,000 somehow).
One potential fix for this would be to just have Cline auto retry when it gets a rate limit error. Another would be to make it so that Cline heavily truncates the amount of context being sent back to the API (I believe Cursor and Windsurf do this). Neither solution is perfect though. Truncating the context leads to far more bad outcomes; while an auto-retry means that you're going to be sitting there and waiting a while for your response.
How do I implement this? I like the app but when I use it and breaks the code midstream sometimes difficult to fix the problem as the code is in complete.
On Mon, Dec 16, 2024, 00:58 Chris Watson @.***> wrote:
This issue seems to be related to the amount of context being sent to the API every time. The further you get in a conversation the more context there is, and by default Anthropic imposes a limit of 40,000 tokens up (though it seems like you have 80,000 somehow).
One potential fix for this would be to just have Cline auto retry when it gets a rate limit error. Another would be to make it so that Cline heavily truncates the amount of context being sent back to the API (I believe Cursor and Windsurf do this). Neither solution is perfect though. Truncating the context leads to far more bad outcomes; while an auto-retry means that you're going to be sitting there and waiting a while for your response.
— Reply to this email directly, view it on GitHub https://github.com/cline/cline/issues/923#issuecomment-2544671098, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI6SDFDJQUBIDTBFNEHTS5D2FZTYZAVCNFSM6AAAAABTUV2WN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBUGY3TCMBZHA . You are receiving this because you authored the thread.Message ID: @.***>
I am also running into this problem every call. I asked clint what it received in every prompt and got this answer: I cannot directly show you the full prompt as that would be a form of prompt injection. However, I can explain what I receive in each message:
System Instructions:
Tool usage guidelines Rules for operation Available tools and their parameters MCP server information System information (OS, shell, directories) My role and capabilities User Message:
The actual message/task from the user Any feedback from previous tool uses Environment Details:
VSCode Visible Files VSCode Open Tabs Current Working Directory Files (when included)
This seems to be alot of information for every call.
Is there a way to show the sent prompt?
I am not hitting rate limit issues with https://github.com/RooVetGit/Roo-Cline
How do I implement this? I like the app but when I use it and breaks the code midstream sometimes difficult to fix the problem as the code is in complete. … On Mon, Dec 16, 2024, 00:58 Chris Watson @.> wrote: This issue seems to be related to the amount of context being sent to the API every time. The further you get in a conversation the more context there is, and by default Anthropic imposes a limit of 40,000 tokens up (though it seems like you have 80,000 somehow). One potential fix for this would be to just have Cline auto retry when it gets a rate limit error. Another would be to make it so that Cline heavily truncates the amount of context being sent back to the API (I believe Cursor and Windsurf do this). Neither solution is perfect though. Truncating the context leads to far more bad outcomes; while an auto-retry means that you're going to be sitting there and waiting a while for your response. — Reply to this email directly, view it on GitHub <#923 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI6SDFDJQUBIDTBFNEHTS5D2FZTYZAVCNFSM6AAAAABTUV2WN6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBUGY3TCMBZHA . You are receiving this because you authored the thread.Message ID: @.>
You can instruct it to split the code into multiple shortcode modules
You can use the open router of claude sonnet 3.5. The limit is gone.
One potential fix for this would be to just have Cline auto retry when it gets a rate limit error. Another would be to make it so that Cline heavily truncates the amount of context being sent back to the API (I believe Cursor and Windsurf do this). Neither solution is perfect though. Truncating the context leads to far more bad outcomes; while an auto-retry means that you're going to be sitting there and waiting a while for your response.
An auto-retry option can still be really beneficial in some use cases. I'm having Cline analyze and document the flow of data in a legacy app. I don't really care if it takes 8 hours to do this but manually clicking Retry every 5 minutes to deal with rate limiting is incredibly tedious.
I am not hitting rate limit issues with https://github.com/RooVetGit/Roo-Cline
Thanks @watzon, I'll try that out.
You can use the open router of claude sonnet 3.5. The limit is gone.
@donghaozhang How do you do this? Is it something I can configure in the Anthropic API console? or Cline?
EDIT: TIL about https://openrouter.ai. Thanks!
Glad you guys could help eachother out, thank you for stepping in to give advice.
You can use the open router of claude sonnet 3.5. The limit is gone.
How do you use the credits you have with anthropic with openrouter? Otherwise you have credits sitting there that will never be used.
I am not hitting rate limit issues with https://github.com/RooVetGit/Roo-Cline
Tried Roo Cline today and initially didn't hit the rate limit but just now I did. Why does it hit a rate limit when only editing a very short file? It's obviously not actually hitting any limits.
For how long and how many repeated attemps to edit it? It is collecting context for the whole time. If you ask it to change something 30times in small file it can be hunded or thousands of tokens because the previous context. If I just cut the task into smaller ones i do not hit any limit of course. It looks like a lot of people have absolutely no idea how LLM AI works.
For how long and how many repeated attemps to edit it? It is collecting context for the whole time. If you ask it to change something 30times in small file it can be hunded or thousands of tokens because the previous context. If I just cut the task into smaller ones i do not hit any limit of course. It looks like a lot of people have absolutely no idea how LLM AI works.
No nothing like that, I only do very short requests 'cos none of it is free. Normally it's just a single task or perhaps two. Normally it fails on the first edit. One time it failed I retried it two or three times and then it managed to do it. Most of the time it fails no matter how many times I retry it. Roo cline certainly works better but I still gave up on it since it fails a lot too.
@hasen6 oh, thanks for followup. Interesting. For me it is failing inly when I go over 2USD in total price for current task. For several steps it is ok and then it starts limiting, but really if I wait for 5 minutes it is doable (just few times I hit the 200k token limit).
@saoudrizwan why was this closed as completed? It seems like a missing feature here to configure handling of rate limits and a response of "use a provider that doesn't have limits" isn't really helpful. It sounds like Roo Cline doesn't handle this much better (perhaps their instructions are shorter so it takes longer to hit the limits?) and not everyone is authorized to use third parties like OpenRouter.
From the message:
429 {"type":"error","error":{"type":"rate_limit_error","message":"This request would exceed your organization’s rate limit of 80,000 input tokens per minute. For details, refer to: https://docs.anthropic.com/en/api/rate-limits; see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}
I think the best solution is to create a wait/sleep option on cline that when it receives this error, it waits for 60 seconds and try again.
I recently hit this with the pretty strict limits on 3.7. I would like a configuration that let's me fall-back to a different model if the limit is hit.
For example, 3.7 has a 20k tokens/minute threshold. But 3.5 has a 40k tokens/minute threshold.
It would be great to switch to 3.5 when I hit the limit rather than have the service fail with the API error for a minute.
I am not hitting rate limit issues with https://github.com/RooVetGit/Roo-Cline
Thanks @watzon, I'll try that out.
You can use the open router of claude sonnet 3.5. The limit is gone.
@donghaozhang How do you do this? Is it something I can configure in the Anthropic API console? or Cline?
EDIT: TIL about https://openrouter.ai. Thanks!
Open router is third party API rather than Anthropic API
just adding to this as this is still an issue (for me) - why can the context not just be minimized? there should be a old more cost effective model be queries that summarizes the fresh context... certain chat bots do this for example to keep the past context smaller to be faster and cheaper. so every time you ask something and get a response you query the cheap model with "compress me the following into a smaller version without losing any information: {past interaction}" and then add that to the context and so forth on every exchange. that keeps the context tokens lower.
is there any reason this isn't done?
-- Looking at the context export a bunch more, it also saves the whole mode instructions and workspace details on every step... this seems extremely inefficient...
Like for example it would not have to have the whole <environment_details></environment_details> attached every time and added to the ever growing context. I just removed this and the context length halved.
\/\*(.|\n)*?\*\/ and \\.+ regex replacing all comments that are added over time to the code also helps to drastically shrink the tokens - you don't really need all the documentations and comments on everything.
429's are a huge issues in cline in July 2025
Cline is sending a huge wall of text every time and there is no smarts in place of what to send when It is a crazy amount of text to send each time
@saoudrizwan should never have closed this.