graphiti Handling of Google API Rate Limits (429 Error) for Batch Embeddings with Paid Gemini Plan

Hi Graphyti Team,

Problem Description: I am encountering a 429 RESOURCE_EXHAUSTED error when making batch embedding requests to Google's Generative AI API using graphiti. This occurs even though I am on a paid Gemini plan, which I expected would provide higher or more flexible rate limits.

The specific error indicates that the BatchEmbedContentsRequestsPerMinutePerProjectPerRegion quota is being exceeded.

Observed Behavior: The application fails with the following error when making a series of batch embedding calls:

File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/graphiti_core/embedder/gemini.py", line 62, in create
  result = await self.client.ain.models.embed_content(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/google/generativeai/models.py", line 6508, in embed_content
  response_dict = await self._client.api_client.async_request(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/google/generativeai/client.py", line 709, in async_request
  result = await self._async_request(http_request=http_request, stream=stream)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/google/generativeai/client.py", line 733, in _async_request
  await errors.APIError.raise_for_async_response(response)
File "/Users/lostmartian/Desktop/flask/venv/lib/python3.11/site-packages/google/generativeai/errors.py", line 129, in raise_for_async_response
  raise ClientError(status_code, response_json, response)
google.generativeai.errors.ClientError: 429 RESOURCE_EXHAUSTED. ldoc: ('code: 429, "message": "Quota exceeded for quota metric \'Batch Embed content requests\' and limit \'Batch embed contents request limit per minute for a region\' of service \'generativelanguage.googleapis.com\' for consumer \'project_number:4020930207197\'.", "status": "RESOURCE_EXHAUSTED", "details": [{"@type": "type.googleapis.com/google.rpc.ErrorInfo", "reason": "RATE_LIMIT_EXCEEDED", "domain": "googleapis.com", "metadata": {"service": "generativelanguage.googleapis.com", "quota_location": "us-south1", "quota_metric": "generativelanguage.googleapis.com/batch_embed_content_requests", "consumer": "projects/4020930207197", "quota_limit": "BatchEmbedContentsRequestsPerMinutePerProjectPerRegion", "quota_unit": "1/min/[project]/[region]", "quota_limit_value": "15M"}}], "links": [{"description": "Request a higher quota limit.", "url": "https://cloud.google.com/docs/quotas/view-manage#requesting_higher_quota"}]}')
sys:1: RuntimeWarning: coroutine 'EntityEdge.generate_embedding' was never awaited
sys:1: RuntimeWarning: coroutine 'EntityNode.generate_name_embedding' was never awaited

Expected Behavior: Ideally, graphiti could offer some level of abstraction or built-in handling for such common API rate limits. This might include:

Automatic retries with exponential backoff when a 429 error is encountered.
Client-side throttling or queueing mechanisms to stay within known limits.
Configuration options for users to define their API call rate preferences or inform graphiti about their specific plan limits.

Context/Environment:

Graphyti Version: 0.11.6
Embedding Provider: Google Generative AI (Gemini)
API Plan: Paid Gemini Plan
Operation: Batch content embedding (embed_content)

Question for the Team:

Does graphiti currently implement any specific strategies (like retries, backoff, or throttling) to manage API rate limits for embedding providers like Google Generative AI, particularly for batch operations?
If so, how can these be configured or best utilized?
If not, would you consider adding features to help users gracefully handle these 429 RESOURCE_EXHAUSTED errors and stay within API quotas?

This would greatly improve the robustness of applications built with graphiti that rely heavily on embedding services.

Thank you for your time and consideration.

May 31 '25 10:05 sahil7893

https://github.com/getzep/graphiti/issues/290

A few users (including myself) are facing the same issue (regardless provider chosen)

Try: SEMAPHORE_LIMIT=1 in your env.

Jun 06 '25 01:06 kkarkos

#290

A few users (including myself) are facing the same issue (regardless provider chosen)

Try: SEMAPHORE_LIMIT=1 in your env.

The latest release still didn’t fix this issue. Did this limit value work for you.

Jun 15 '25 03:06 codematrix

I think it's working for me. I set 5 for SEMAPHORE_LIMIT

Jun 19 '25 05:06 ad201717

I think it's working for me. I set 5 for SEMAPHORE_LIMIT

Yes this limit worked as well. It takes a bit longer but for now I can get by.

Thanks

Jun 19 '25 11:06 codematrix

The MCP Server now has a lower default concurrency limit of 10. Configurable using SEMAPHORE_LIMIT. See: https://github.com/getzep/graphiti/pull/623

Jun 24 '25 22:06 danielchalef