Internal Server Error (500) when using Gemini API with Inspect framework
Description of the bug:
Description
I'm consistently encountering an Internal Server Error (HTTP 500) when trying to use the Google Gemini API through the Inspect evaluation framework. This error occurs during the generate_content call. I'm currently on the free trial.
Steps to Reproduce
- Set up an evaluation using the Inspect framework
- Configure the evaluation to use the Gemini model (in my case,
google/gemini-1.5-pro) - Run the evaluation
Error Message
InternalServerError: 500 An internal error has occurred. Please retry or report in https://developers.generativeai.google/guide/troubleshooting
Environment
- Operating System: MacOS Sonoma 14.5
- Python version: 3.12.4
Package Versions
- google-ai-generativelanguage: 0.6.6
- google-api-core: 2.19.2
- google-api-python-client: 2.143.0
- google-auth: 2.34.0
- google-auth-httplib2: 0.2.0
- google-generativeai: 0.7.2
- googleapis-common-protos: 1.65.0
Full error traceback:
╭─ benchmarks/gpqa (78 samples): google/gemini-1.5-pro ─────────────────────────────────────────────────────────────────────╮
│ ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ dataset: (samples) │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ scorer: choice │
│ │ in task_run │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in task_run_sample │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in __call__ │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in task_run_sample │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in solve │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in generate │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in task_generate │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in generate │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in _generate │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/tenacity/as… │ │
│ │ in async_wrapped │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/tenacity/as… │ │
│ │ in __call__ │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/tenacity/as… │ │
│ │ in iter │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/tenacity/_u… │ │
│ │ in inner │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/tenacity/__… │ │
│ │ in <lambda> │ │
│ │ │ │
│ │ /opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.1… │ │
│ │ in result │ │
│ │ │ │
│ │ 446 │ │ │ │ if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]: │ │
│ │ 447 │ │ │ │ │ raise CancelledError() │ │
│ │ 448 │ │ │ │ elif self._state == FINISHED: │ │
│ │ ❱ 449 │ │ │ │ │ return self.__get_result() │ │
│ │ 450 │ │ │ │ │ │
│ │ 451 │ │ │ │ self._condition.wait(timeout) │ │
│ │ 452 │ │
│ │ │ │
│ │ /opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.1… │ │
│ │ in __get_result │ │
│ │ │ │
│ │ 398 │ def __get_result(self): │ │
│ │ 399 │ │ if self._exception: │ │
│ │ 400 │ │ │ try: │ │
│ │ ❱ 401 │ │ │ │ raise self._exception │ │
│ │ 402 │ │ │ finally: │ │
│ │ 403 │ │ │ │ # Break a reference cycle with the exception in self._exception │ │
│ │ 404 │ │ │ │ self = None │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/tenacity/as… │ │
│ │ in __call__ │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in generate │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/inspect_ai/… │ │
│ │ in generate │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/google/gene… │ │
│ │ in generate_content_async │ │
│ │ │ │
│ │ 382 │ │ │ │ │ ) │ │
│ │ 383 │ │ │ │ return await generation_types.AsyncGenerateContentResponse.from_aiterato │ │
│ │ 384 │ │ │ else: │ │
│ │ ❱ 385 │ │ │ │ response = await self._async_client.generate_content( │ │
│ │ 386 │ │ │ │ │ request, │ │
│ │ 387 │ │ │ │ │ **request_options, │ │
│ │ 388 │ │ │ │ ) │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/google/ai/g… │ │
│ │ in generate_content │ │
│ │ │ │
│ │ 403 │ │ self._client._validate_universe_domain() │ │
│ │ 404 │ │ │ │
│ │ 405 │ │ # Send the request. │ │
│ │ ❱ 406 │ │ response = await rpc( │ │
│ │ 407 │ │ │ request, │ │
│ │ 408 │ │ │ retry=retry, │ │
│ │ 409 │ │ │ timeout=timeout, │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/google/api_… │ │
│ │ in retry_wrapped_func │ │
│ │ │ │
│ │ 227 │ │ │ sleep_generator = exponential_sleep_generator( │ │
│ │ 228 │ │ │ │ self._initial, self._maximum, multiplier=self._multiplier │ │
│ │ 229 │ │ │ ) │ │
│ │ ❱ 230 │ │ │ return await retry_target( │ │
│ │ 231 │ │ │ │ functools.partial(func, *args, **kwargs), │ │
│ │ 232 │ │ │ │ predicate=self._predicate, │ │
│ │ 233 │ │ │ │ sleep_generator=sleep_generator, │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/google/api_… │ │
│ │ in retry_target │ │
│ │ │ │
│ │ 157 │ │ # This function explicitly must deal with broad exceptions. │ │
│ │ 158 │ │ except Exception as exc: │ │
│ │ 159 │ │ │ # defer to shared logic for handling errors │ │
│ │ ❱ 160 │ │ │ _retry_error_helper( │ │
│ │ 161 │ │ │ │ exc, │ │
│ │ 162 │ │ │ │ deadline, │ │
│ │ 163 │ │ │ │ sleep, │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/google/api_… │ │
│ │ in _retry_error_helper │ │
│ │ │ │
│ │ 209 │ │ │ RetryFailureReason.NON_RETRYABLE_ERROR, │ │
│ │ 210 │ │ │ original_timeout, │ │
│ │ 211 │ │ ) │ │
│ │ ❱ 212 │ │ raise final_exc from source_exc │ │
│ │ 213 │ if on_error_fn is not None: │ │
│ │ 214 │ │ on_error_fn(exc) │ │
│ │ 215 │ if deadline is not None and time.monotonic() + next_sleep > deadline: │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/google/api_… │ │
│ │ in retry_target │ │
│ │ │ │
│ │ 152 │ │ │
│ │ 153 │ for sleep in sleep_generator: │ │
│ │ 154 │ │ try: │ │
│ │ ❱ 155 │ │ │ return await target() │ │
│ │ 156 │ │ # pylint: disable=broad-except │ │
│ │ 157 │ │ # This function explicitly must deal with broad exceptions. │ │
│ │ 158 │ │ except Exception as exc: │ │
│ │ │ │
│ │ /Users/lenni/Documents/GitHub/biology-benchmarks/.venv/lib/python3.12/site-packages/google/api_… │ │
│ │ in __await__ │ │
│ │ │ │
│ │ 85 │ │ │ response = yield from self._call.__await__() │ │
│ │ 86 │ │ │ return response │ │
│ │ 87 │ │ except grpc.RpcError as rpc_error: │ │
│ │ ❱ 88 │ │ │ raise exceptions.from_grpc_error(rpc_error) from rpc_error │ │
│ │ 89 │ │
│ │ 90 │ │
│ │ 91 class _WrappedStreamResponseMixin(Generic[P], _WrappedCall): │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ InternalServerError: 500 An internal error has occurred. Please retry or report in │
│ https://developers.generativeai.google/guide/troubleshooting
Actual vs expected behavior:
No response
Any other information you'd like to share?
No response
Hi @lennijusten ,
Have you been facing this issue since the beginning ? I mean is it like you were getting the response earlier and now facing this issue?
Try upgrading the google.generative library once and then run your commands :
pip install -q --upgrade google-generativeai
Please let us know if you are still having the issue.
Yes. It's been a semi-persistent problem throughout my usage of Gemini. Sometimes it works, but then after some number of requests or tokens, I get the Internal Server Error (500). The issue is also discussed here and here on Reddit.
I ran pip install -q --upgrade google-generativeai which upgraded me from google-generativeai==0.6.6 to google-generativeai==0.8.2 but the issue persists.
If there is some kind of rate limit happening it would be useful for the error code to reflect that.
With the current workings, I'm just burning tokens for nothing since my task never completes before the error.
Hi @lennijusten . Sometimes we see Internal Server Error (500) because there are too many requests coming, which results in an unexpected error on Google's side. You can refer to this Troubleshooting doc "https://ai.google.dev/gemini-api/docs/troubleshooting?lang=python". I recommend temporarily switching to another model (e.g. from Gemini 1.5 Pro to Gemini 1.5 Flash) and see if it works.
Internal Server Error (500) is not related to rate limit, but free tier does have rate limits based model variation. Please refer to this doc : "https://ai.google.dev/gemini-api/docs/models/gemini" or "https://ai.google.dev/pricing"
If you want to upgrade to "Pay-as-you-go" tier, you need to set up a billing account.
Hi, thanks for the report.
There have been cases where bad inputs cause 500s. But it sounds like it wasn't consistent about which requests cause the error.
Like @gmKeshari said the service used to throw 500 errrors when it's overloaded (I think they've changed that to a clearer error)
For intermittent errors, the SDK supports a 'retry' argument:
import google.generativeai as genai
from google.api_core import retry
model = genai.GenerativeModel('gemini-2.0-flash')
# For convenience, a simple wrapper to let the SDK handle error retries
def generate_with_retry(model, prompt):
return model.generate_content(prompt, request_options={'retry':retry.Retry()})
I think the default limit is 5min, but it's configurable.
I have a billing account, but I'm facing the same error. It suddenly appeared... You know what I mean? It was working a few hours ago, but now it isn't. I'm making many async I/O calls, and it was functioning properly before.
I am currently facing the issue, any solution yet?
me too
Yes, it is happening for me too.
Me too
I'm getting the exact 500 Internal on a Tier 1 account with Gemini 2.5 Pro. Works for 1-2 requests, throws 500 afterwards. TOo bad since their models are pretty good but extremely unreliable.
A 500 error can be generated for a variety of reasons.
It feels like this issue isn't really connected to the inspect framework, and is just collecting any reports of 500 errors that come up.
I'm tempted to just close this issue, and ask you each to raise new issues including code to reproduce the problem, because I suspect that you're each getting 500s for your own unique reasons, and I need more info to debug this.
Originally I was getting HTTP 500 but in the past week it has changed to 503
ApiError: {"error":{"code":503,"message":"The model is overloaded. Please try again later.","status":"UNAVAILABLE"}}
We use the 'gemini-embedding-001' model. This error just comes randomly regardless of the input. This response error comes in wave as I can observe.
And no, we don't use this python sdk, but this issue is the closest I could find. If there's a better place to track this please let me know.
Hi @Jeanno,
This error is exactly what it says, the model is overloaded try again in a few minutes.
The SDK's have some retry options you can use (like I mentioned here: https://github.com/google-gemini/deprecated-generative-ai-python/issues/573#issuecomment-2668824405), but they're mostly simple exponential backoff and retry.
Hi @Jeanno,
This error is exactly what it says, the model is overloaded try again in a few minutes.
The SDK's have some retry options you can use (like I mentioned here: #573 (comment)), but they're mostly simple exponential backoff and retry.
I understood what you said. However, since gemini-embedding-001 is a "stable model" and it is, quote from the Google Cloud page - "A publicly released version of a model that is available and supported for production use", I expected a much higher reliability from this endpoint.
https://cloud.google.com/vertex-ai/generative-ai/docs/learn/model-versions
We already have retries in place. Since our use case is search, longer and multiple retries don't really make sense, we limited to only 1 retry. And since this error comes in waves, the next retry will likely to fail, too.
We made about 10k calls to this endpoint per day and about 2% of them return with 503 error. I expect a "production ready" API has at least an order of magnitude lower of error rate. And our team is frustrated with this situation, as we completed the integration and ramped up to production, only to find ourselves switching to an unreliable API.
P.S.: I totally understand it takes time for Google to improve on the situation and I'm certain that it will. At this point, I'm just asking for, perhaps a pointer to the appropriate issue before you close this one, so that I can track and follow to get updates.
Feel free to let me know if I had any incorrect expectations.
Marking this issue as stale since it has been open for 14 days with no activity. This issue will be closed if no further activity occurs.
I am happy to report back that the problem has gone for the past 2 weeks.
Thanks!