graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

LLM inference expenses are pretty high

Open n-sviridenko opened this issue 8 months ago • 3 comments

For ~40 chats (each of them is 150-250 words in total), without custom entities (I excluded them), it costs nearly $0.80 with default OpenAI models. In our case we have thousands of chats. What are the things taking most of tokens? We may not need invalidation f.e. Maybe there are other things we can remove to reduce cost per a chat 5-10x at least.

n-sviridenko avatar May 10 '25 12:05 n-sviridenko

The amount of Episodic (blue) and Entity (brown) nodes [no Community nodes generated yet]:

Image

n-sviridenko avatar May 12 '25 08:05 n-sviridenko

Did some analysis - the biggest share is taken by the entity properties prompt.

LLM Expense Analysis

Summary

Total Episodes: 40 Average Cost per Episode: $0.018139

Breakdown per Prompt Type per Episode:

Prompt Type Avg Count Cost per Call Total Cost % of Episode
You are a helpful assistant that extracts entity properties from the provided... 4.53 $0.002990 $0.013541 74.7%
You are a helpful assistant that determines whether or not a NEW ENTITY is a ... 4.53 $0.000803 $0.003636 20.0%
You are an expert fact extractor that extracts fact triples from text. 1. Ext... 1.00 $0.002892 $0.002892 15.9%
You are an AI assistant that extracts entity nodes from text. 1.00 $0.000593 $0.000593 3.3%
You are a helpful assistant that de-duplicates edges from edge lists. 7.00 $0.000023 $0.000163 0.9%
You are an AI assistant that determines which facts contradict each other. 4.00 $0.000017 $0.000069 0.4%

Cost Analysis Notes

  • The most expensive prompt type: 'You are a helpful assistant that extracts entity p...' accounts for 74.7% of total episode cost
  • The prompt with highest frequency appears 7.00 times per episode on average
  • The top two prompt types account for 94.7% of total episode cost

n-sviridenko avatar May 12 '25 11:05 n-sviridenko

We've recently made significant improvements to Graphiti token usage. We've also implemented the concept of a "small model", which is used as a classifier rather than a more expensive model. In Zep's implementation, this is gpt-4.1-nano.

Let me know your thoughts on these improvements.

danielchalef avatar May 22 '25 04:05 danielchalef

@n-sviridenko Is this still relevant? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 05 '25 00:10 claude[bot]

@n-sviridenko Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 22 '25 00:10 claude[bot]

@n-sviridenko Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 29 '25 00:10 claude[bot]