Handling of Google API Rate Limits (429 Error) for Batch Embeddings with Paid Gemini Plan
Hi Graphyti Team,
Problem Description:
I am encountering a 429 RESOURCE_EXHAUSTED error when making batch embedding requests to Google's Generative AI API using graphiti. This occurs even though I am on a paid Gemini plan, which I expected would provide higher or more flexible rate limits.
The specific error indicates that the BatchEmbedContentsRequestsPerMinutePerProjectPerRegion quota is being exceeded.
Observed Behavior: The application fails with the following error when making a series of batch embedding calls:
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/graphiti_core/embedder/gemini.py", line 62, in create
result = await self.client.ain.models.embed_content(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/google/generativeai/models.py", line 6508, in embed_content
response_dict = await self._client.api_client.async_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/google/generativeai/client.py", line 709, in async_request
result = await self._async_request(http_request=http_request, stream=stream)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/google/generativeai/client.py", line 733, in _async_request
await errors.APIError.raise_for_async_response(response)
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/google/generativeai/errors.py", line 129, in raise_for_async_response
raise ClientError(status_code, response_json, response)
google.generativeai.errors.ClientError: 429 RESOURCE_EXHAUSTED. ldoc: ('code: 429, "message": "Quota exceeded for quota metric \'Batch Embed content requests\' and limit \'Batch embed contents request limit per minute for a region\' of service \'generativelanguage.googleapis.com\' for consumer \'project_number:4020930207197\'.", "status": "RESOURCE_EXHAUSTED", "details": [{"@type": "type.googleapis.com/google.rpc.ErrorInfo", "reason": "RATE_LIMIT_EXCEEDED", "domain": "googleapis.com", "metadata": {"service": "generativelanguage.googleapis.com", "quota_location": "us-south1", "quota_metric": "generativelanguage.googleapis.com/batch_embed_content_requests", "consumer": "projects/4020930207197", "quota_limit": "BatchEmbedContentsRequestsPerMinutePerProjectPerRegion", "quota_unit": "1/min/[project]/[region]", "quota_limit_value": "15M"}}], "links": [{"description": "Request a higher quota limit.", "url": "https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota"}]}')
sys:1: RuntimeWarning: coroutine 'EntityEdge.generate_embedding' was never awaited
sys:1: RuntimeWarning: coroutine 'EntityNode.generate_name_embedding' was never awaited
Expected Behavior:
Ideally, graphiti could offer some level of abstraction or built-in handling for such common API rate limits. This might include:
- Automatic retries with exponential backoff when a 429 error is encountered.
- Client-side throttling or queueing mechanisms to stay within known limits.
- Configuration options for users to define their API call rate preferences or inform
graphitiabout their specific plan limits.
Context/Environment:
-
Graphyti Version:
0.11.6 - Embedding Provider: Google Generative AI (Gemini)
- API Plan: Paid Gemini Plan
-
Operation: Batch content embedding (
embed_content)
Question for the Team:
- Does
graphiticurrently implement any specific strategies (like retries, backoff, or throttling) to manage API rate limits for embedding providers like Google Generative AI, particularly for batch operations? - If so, how can these be configured or best utilized?
- If not, would you consider adding features to help users gracefully handle these
429 RESOURCE_EXHAUSTEDerrors and stay within API quotas?
This would greatly improve the robustness of applications built with graphiti that rely heavily on embedding services.
Thank you for your time and consideration.
https://github.com/getzep/graphiti/issues/290
A few users (including myself) are facing the same issue (regardless provider chosen)
Try: SEMAPHORE_LIMIT=1 in your env.
A few users (including myself) are facing the same issue (regardless provider chosen)
Try: SEMAPHORE_LIMIT=1 in your env.
The latest release still didn’t fix this issue. Did this limit value work for you.
I think it's working for me. I set 5 for SEMAPHORE_LIMIT
I think it's working for me. I set 5 for SEMAPHORE_LIMIT
Yes this limit worked as well. It takes a bit longer but for now I can get by.
Thanks
The MCP Server now has a lower default concurrency limit of 10. Configurable using SEMAPHORE_LIMIT. See: https://github.com/getzep/graphiti/pull/623