Error while using Gemini 2.5 pro
What happened?
When using the Cline VSCode extension configured with the Google Gemini 2.5 Pro model, the error message "Provider returned error" frequently occurs without providing additional details or context. The issue temporarily resolves after clicking "Retry" several times, enabling successful processing of one or two requests, after which the error reappears.
Steps to reproduce
- Configure the Cline VSCode extension with the Google Gemini 2.5 Pro model.
- Initiate a chat or request via the Cline extension interface.
- Observe the frequent occurrence of the "Provider returned error" message.
- Click the "Retry" button several times until the request eventually succeeds.
- Attempt additional requests; the error reoccurs regularly after one or two successful interactions.
Relevant API REQUEST output
Operating System
Windows 10
Cline Version
v3.8.0
Additional context
No response
Thanks for reporting this issue and providing the detailed steps to reproduce.
Based on the behavior you described (frequent errors requiring retries, working for 1-2 requests then failing again) and the model you're using (Gemini 2.5 Pro Experimental), this strongly suggests you're encountering the rate limits imposed by Google on this specific experimental model tier via your API key.
As shown in the model details from Google AI Studio (like the image provided):
The model gemini-2.5-pro-exp-03-25 is clearly marked as Experimental. Experimental models often have stricter limitations and less stability than generally available ones. It has specific Rate limits. The image shows a general limit of 5 RPM, but importantly, it also shows a Free tier limit of 2 RPM (Requests Per Minute) and 50 requests per day.
It's highly likely that your usage pattern, even with just a few rapid requests or retries, is exceeding the 2 RPM limit associated with your Google API key for this free, experimental model. When you exceed this limit, Google's API returns an error. Cline receives this generic "Provider returned error" because the underlying API call failed due to the rate limit. Clicking "Retry" might eventually work once enough time has passed (e.g., > 30 seconds) for the rate limit window to allow another request.
This isn't an issue with Cline's code itself or a shared API key, but rather a limitation imposed by the provider (Google) on the specific model tier you've chosen to use with your personal API key.
Recommendations:
-
Switch to a More Stable Model: Try configuring Cline to use a more stable, generally available Gemini model (like Gemini 1.0 Pro or Gemini 1.5 Pro, depending on availability and your needs). These often have higher rate limits.
-
Check Limits in Google AI Studio: Explore the different models available in Google AI Studio and check their specific rate limits to find one that better aligns with your expected usage.
-
Consider Other Providers: If Google's rate limits are too restrictive for your workflow, you could configure Cline to use a different LLM provider.
Let me know if switching to a different, non-experimental Gemini model resolves the persistent errors for you.
All that being said the error display can be done in a better way so that's something that needs to be looked at.
- I was able to replicate this by hitting the API rate limits myself and it looked like this
- My API rate limits were hit by using a python script to purposesly hit them and the error from gemini looked like this
*** Rate limit error encountered on request 1292! Stopping. ***
Error details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [violations {
}
, links {
description: "Learn more about Gemini API quotas"
url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, retry_delay {
seconds: 19
}
]
*** Rate limit error encountered on request 1282! Stopping. ***
Error details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [violations {
}
, links {
description: "Learn more about Gemini API quotas"
url: "https://ai.google.dev/gemini-api/docs/rate-limits"
}
, retry_delay {
seconds: 19
}
]
Hi,
thanks for your reply.
I suspected rate limits but I was missing a button like "Show details" or so. It wasn't apparent what has caused the failed API request.
I also should have said that I used the model via OpenRouter and not directly.
Thanks
Ara @.***> schrieb am So., 30. März 2025, 03:29:
All that being said the error display can be done in a better way so that's something that needs to be looked at.
- I was able to replicate this by hitting the API rate limits myself and it looked like this
image.png (view on web) https://github.com/user-attachments/assets/d241622f-61aa-422a-a24e-3e95421d6bd2
- My API rate limits were hit by using a python script to purposesly hit them and the error from gemini looked like this
*** Rate limit error encountered on request 1292! Stopping. *** Error details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [violations { } , links { description: "Learn more about Gemini API quotas" url: "https://ai.google.dev/gemini-api/docs/rate-limits" } , retry_delay { seconds: 19 } ]
*** Rate limit error encountered on request 1282! Stopping. *** Error details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [violations { } , links { description: "Learn more about Gemini API quotas" url: "https://ai.google.dev/gemini-api/docs/rate-limits" } , retry_delay { seconds: 19 } ]
— Reply to this email directly, view it on GitHub https://github.com/cline/cline/issues/2540#issuecomment-2764329372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMM2EWYCJYPCBTXH3B7JBT2W5CHHAVCNFSM6AAAAAB2BX3BHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGMZDSMZXGI . You are receiving this because you authored the thread.Message ID: @.***> [image: arafatkatze]arafatkatze left a comment (cline/cline#2540) https://github.com/cline/cline/issues/2540#issuecomment-2764329372
All that being said the error display can be done in a better way so that's something that needs to be looked at.
- I was able to replicate this by hitting the API rate limits myself and it looked like this
image.png (view on web) https://github.com/user-attachments/assets/d241622f-61aa-422a-a24e-3e95421d6bd2
- My API rate limits were hit by using a python script to purposesly hit them and the error from gemini looked like this
*** Rate limit error encountered on request 1292! Stopping. *** Error details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [violations { } , links { description: "Learn more about Gemini API quotas" url: "https://ai.google.dev/gemini-api/docs/rate-limits" } , retry_delay { seconds: 19 } ]
*** Rate limit error encountered on request 1282! Stopping. *** Error details: 429 You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [violations { } , links { description: "Learn more about Gemini API quotas" url: "https://ai.google.dev/gemini-api/docs/rate-limits" } , retry_delay { seconds: 19 } ]
— Reply to this email directly, view it on GitHub https://github.com/cline/cline/issues/2540#issuecomment-2764329372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAMM2EWYCJYPCBTXH3B7JBT2W5CHHAVCNFSM6AAAAAB2BX3BHWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDONRUGMZDSMZXGI . You are receiving this because you authored the thread.Message ID: @.***>
It would be neat to implement some sort of automatic rate limiting feature, pulling the time to wait from the API response.
This is what my typical rate limit response from the Google API looks like: (Line breaks added for readability)
[GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-exp-03-25:streamGenerateContent?alt=sse: [429 Too Many Requests]
You exceeded your current quota, please check your plan and billing details.
For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits.
[{"@type":"type.googleapis.com/google.rpc.QuotaFailure",
"violations":[{"quotaMetric":"generativelanguage.googleapis.com/generate_content_free_tier_requests",
"quotaId":"GenerateRequestsPerDayPerProjectPerModel-FreeTier","quotaDimensions":
{"location":"global","model":"gemini-2.0-pro-exp"},"quotaValue":"50"}]},
{"@type":"type.googleapis.com/google.rpc.Help","links":
[{"description":"Learn more about Gemini API quotas",
"url":"https://ai.google.dev/gemini-api/docs/rate-limits"}]},
{"@type":"type.googleapis.com/google.rpc.RetryInfo","retryDelay":"4s"}]
The retryDelay variable at the end responds with the current time left to wait.
We could grab that, set a timer, and automatically retry the request after the time has elapsed.
Possibly with an extra second added on just to be sure.
It could probably be adapted around to other models/APIs as well. I'm sure most APIs have a similar response to rate limiting. A new input box could be added to the model settings, allowing the user to specify the variable with the time to wait.
If I have enough time this week, I might give it a whirl (unless someone beats me to it). I'm not that great with Typescript though, so no promises! haha.
It would be neat to implement some sort of automatic rate limiting feature, pulling the time to wait from the API response.
This is what my typical rate limit response from the Google API looks like: (Line breaks added for readability)
[GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-exp-03-25:streamGenerateContent?alt=sse: [429 Too Many Requests] You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [{"@type":"type.googleapis.com/google.rpc.QuotaFailure", "violations":[{"quotaMetric":"generativelanguage.googleapis.com/generate_content_free_tier_requests", "quotaId":"GenerateRequestsPerDayPerProjectPerModel-FreeTier","quotaDimensions": {"location":"global","model":"gemini-2.0-pro-exp"},"quotaValue":"50"}]}, {"@type":"type.googleapis.com/google.rpc.Help","links": [{"description":"Learn more about Gemini API quotas", "url":"https://ai.google.dev/gemini-api/docs/rate-limits"}]}, {"@type":"type.googleapis.com/google.rpc.RetryInfo","retryDelay":"4s"}]The
retryDelayvariable at the end responds with the current time left to wait. We could grab that, set a timer, and automatically retry the request after the time has elapsed.Possibly with an extra second added on just to be sure.
It could probably be adapted around to other models/APIs as well. I'm sure most APIs have a similar response to rate limiting. A new input box could be added to the model settings, allowing the user to specify the variable with the time to wait.
If I have enough time this week, I might give it a whirl (unless someone beats me to it). I'm not that great with Typescript though, so no promises! haha.
It should be more scalable an RPM config option, so cline do the wait, maybe I want to run a max X requests per minute without caring about model.
same error i did try different api key and still same error
same error i did try different api key and still same error
ratelimits are based on the project, not the api keys.
I ran into the same 429 error when using Gemini 2.5 Pro, but not while using Gemini 2.5 Flash.
The issue was that the API key I created at https://aistudio.google.com/apikey was linked to a Google Cloud Console project which was not linked to a billing account.
To link it, open Cloud Console, select the project, and click the "Billing" section. If it's not yet linked, it will ask you to link it to an existing billing account.
I error: GoogleGenerativeAIFetchError: [GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-pro-exp-03-25:generateContent: [429 ] You exceeded your current quota, please check your plan and billing details. For more information on this error, head to: https://ai.google.dev/gemini-api/docs/rate-limits. [{"@type":"type.googleapis.com/google.rpc.QuotaFailure","violations":[{"quotaMetric":"generativelanguage.googleapis.com/generate_content_free_tier_input_token_count","quotaId":"GenerateContentInputTokensPerModelPerMinute-FreeTier","quotaDimensions":{"location":"global","model":"gemini-2.0-pro-exp"}},{"quotaMetric":"generativelanguage.googleapis.com/generate_requests_per_model_per_day","quotaId":"GenerateRequestsPerDayPerProjectPerModel"},{"quotaMetric":"generativelanguage.googleapis.com/generate_content_free_tier_requests","quotaId":"GenerateRequestsPerMinutePerProjectPerModel-FreeTier","quotaDimensions":{"model":"gemini-2.0-pro-exp","location":"global"}},{"quotaMetric":"generativelanguage.googleapis.com/generate_content_free_tier_requests","quotaId":"GenerateRequestsPerDayPerProjectPerModel-FreeTier","quotaDimensions":{"location":"global","model":"gemini-2.0-pro-exp"}},{"quotaMetric":"generativelanguage.googleapis.com/generate_content_free_tier_input_token_count","quotaId":"GenerateContentInputTokensPerModelPerDay-FreeTier","quotaDimensions":{"location":"global","model":"gemini-2.0-pro-exp"}}]},{"@type":"type.googleapis.com/google.rpc.Help","links":[{"description":"Learn more about Gemini API quotas","url":"https://ai.google.dev/gemini-api/docs/rate-limits"}]},{"@type":"type.googleapis.com/google.rpc.RetryInfo","retryDelay":"9s"}] at handleResponseNotOk (@google_generative-ai.js?v=e8b28834:226:9) at async makeRequest (@google_generative-ai.js?v=e8b28834:199:5) at async generateContent (@google_generative-ai.js?v=e8b28834:544:20) at async ChatSession.sendMessage (@google_generative-ai.js?v=e8b28834:802:5) at async OnGenerateTrip (index.jsx:102:22)
I have the same error when change from gemini to qwen and try to use qwen:
404 The model `gemini-2.5-pro` does not exist or you do not have access to it.
Request ID: 0261327e-47b9-96a3-920f-4895030a7ce6
I'm Try using Gemini 2.5 pro, at a random time, it will cause issue 500
{"@timestamp":"2025-08-15T07:03:45Z","level":"error","message":"Gemini API error response: {\"error\":{\"code\":500,\"message\":\"An internal error has occurred. Please retry or report in https:\\/\\/developers.generativeai.google\\/guide\\/troubleshooting\",\"status\":\"INTERNAL\"}}","pid":-31656,"service":"aichat"}
While it's fine every time i use gemini 2.5 flash, any solution for this? It's so frustrating when i want to use gemini 2.5 pro, but it results in an error at a random time
@Dimasilham7 Its just a rate limit issue on the end of gemini and there's not much we can do to help there.
WE do have automated retries for Gemini but that's about it
@Dimasilham7 Its just a rate limit issue on the end of gemini and there's not much we can do to help there.
WE do have automated retries for Gemini but that's about it
Hi, thanks for answering, so the issue is on gemini correct?
Yes, Gemini has strange rate limits which vary widely. There's really not much we can do besides retry requests.