graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

number of calls and rate limits

Open jvsteiner opened this issue 10 months ago • 7 comments

Hi, I recently tried to use graphiti the process about 10,000 characters worth of text. I spit it up into about five chunks but when I tried to add it to the graph, I got rate limited at Open API. Turns out there is more than 1000 requests that got made in a couple of minutes. Is that normal?

Also, because they got rate limited the whole thing failed, and now I can't tell what made it into my graph versus not. I don't see any obvious way to configure a rate limit in graphiti to avoid running into this problem at the inference provider. Maybe a feature request?

jvsteiner avatar Mar 13 '25 13:03 jvsteiner

Hey, that number of requests definitely sounds high for that character count. Also, make sure you are adding episodes in sequence (waiting until one chunk is done before adding the next chunk) so that the temporal and deduplication aspects work properly. Lowering the SEMAPHORE_LIMIT environment variable (default is 20) will lower the number of openAI calls we do in parallel so it may help with getting rate limited, but currently we don't directly throttle the LLM requests based on a limit.

prasmussen15 avatar Mar 13 '25 18:03 prasmussen15

I processed the entire docs of Payload into the Graphiti MCP tool yesterday and in the span of like 2 hours, I had 24000 API calls and used up a whopping 41 million tokens. I hadn't specified the OpenAI model, so it defaulted to GPT 4o. Let's just say that it got a liiiiittle more expensive than I would've thought.

So I am setting hard env variables for 4o mini now to continue using it - but something like batch processing and cache utilization would be very nice to have, as it seems to be VERY token hungry and API trigger happy.

Basically processed my input in chat over 30-40 mins and it continued to run queued API calls for 3-4 hours like a machine gun with infinite ammo.

blaesild avatar Apr 02 '25 10:04 blaesild

at least it wasn't gpt-4.5

jvsteiner avatar Apr 02 '25 10:04 jvsteiner

@blaesild Thanks for your feedback. We're planning some work to do to reduce token usage. Additionally, I'm going to make gpt-4o-mini the default for the MCP server.

danielchalef avatar Apr 02 '25 18:04 danielchalef

I hope this gets fixed soon. The SEMAPHORE_LIMIT doesn't seem like the best and only tool to solve this issue.

Nana-Kwame-bot avatar Apr 02 '25 22:04 Nana-Kwame-bot

There's a community-contributed PR here looking to address this issue: https://github.com/getzep/graphiti/pull/311

danielchalef avatar Apr 02 '25 23:04 danielchalef

There's a community-contributed PR here looking to address this issue: #311这里有一个社区贡献的 PR 在寻求解决这个问题: [#311](https:github.com/getzep/graphiti/pull/311

Can some open-source models that support structured output be deployed locally, and then use the local model for inference?

guifou avatar May 18 '25 08:05 guifou

Any progress on this front?

codematrix avatar Jun 12 '25 22:06 codematrix

Hi did you guys work on the reduce token usage for adding episodes, or reducing the number of api calls it takes to add an episode?

abdullahxbrainy avatar Jun 19 '25 12:06 abdullahxbrainy

Also for rate limit there doesn't seem to be any backoff. Increasing MAX_RETRIES also isn't helping in a few cases.

jainkunal avatar Jul 04 '25 15:07 jainkunal

@blaesild Thanks for your feedback. We're planning some work to do to reduce token usage. Additionally, I'm going to make gpt-4o-mini the default for the MCP server.

@danielchalef default model should rather be GPT-4.1-mini now, which is currently also the default model at OpenAI. That'll save people even more API costs :)

snsAIHub avatar Jul 07 '25 12:07 snsAIHub

I am new to the graphiti mcp, finally got it deployed in WSL. I am able to use it via Cline. However, I test the mcp with a simple request "Add an episode about my current project setup: I'm using WSL, have Python expertise, work with PyTorch Lightning and Ray for deep learning, and use Dask for parallel computing with pandas-style operations."

I click on "Cline"-> "MCP servers" -> "installed", under graphiti mcp it shows me rate limit exceeded.

I have never used open AI API, just created an API key to deploy the MCP. I have zero credit in the account.

May I ask what is the possible reason of hitting the rate limit?

IntuitionQuant avatar Jul 13 '25 07:07 IntuitionQuant

Please set SEMAPHORE_LIMIT environment to a lower number. See the instructions here: https://github.com/getzep/graphiti/tree/main/mcp_server#concurrency-and-llm-provider-429-rate-limit-errors

danielchalef avatar Jul 13 '25 20:07 danielchalef