LLM inference expenses are pretty high
For ~40 chats (each of them is 150-250 words in total), without custom entities (I excluded them), it costs nearly $0.80 with default OpenAI models. In our case we have thousands of chats. What are the things taking most of tokens? We may not need invalidation f.e. Maybe there are other things we can remove to reduce cost per a chat 5-10x at least.
The amount of Episodic (blue) and Entity (brown) nodes [no Community nodes generated yet]:
Did some analysis - the biggest share is taken by the entity properties prompt.
LLM Expense Analysis
Summary
Total Episodes: 40 Average Cost per Episode: $0.018139
Breakdown per Prompt Type per Episode:
| Prompt Type | Avg Count | Cost per Call | Total Cost | % of Episode |
|---|---|---|---|---|
| You are a helpful assistant that extracts entity properties from the provided... | 4.53 | $0.002990 | $0.013541 | 74.7% |
| You are a helpful assistant that determines whether or not a NEW ENTITY is a ... | 4.53 | $0.000803 | $0.003636 | 20.0% |
| You are an expert fact extractor that extracts fact triples from text. 1. Ext... | 1.00 | $0.002892 | $0.002892 | 15.9% |
| You are an AI assistant that extracts entity nodes from text. | 1.00 | $0.000593 | $0.000593 | 3.3% |
| You are a helpful assistant that de-duplicates edges from edge lists. | 7.00 | $0.000023 | $0.000163 | 0.9% |
| You are an AI assistant that determines which facts contradict each other. | 4.00 | $0.000017 | $0.000069 | 0.4% |
Cost Analysis Notes
- The most expensive prompt type: 'You are a helpful assistant that extracts entity p...' accounts for 74.7% of total episode cost
- The prompt with highest frequency appears 7.00 times per episode on average
- The top two prompt types account for 94.7% of total episode cost
We've recently made significant improvements to Graphiti token usage. We've also implemented the concept of a "small model", which is used as a classifier rather than a more expensive model. In Zep's implementation, this is gpt-4.1-nano.
Let me know your thoughts on these improvements.
@n-sviridenko Is this still relevant? Please confirm within 14 days or this issue will be closed.
@n-sviridenko Is this still an issue? Please confirm within 14 days or this issue will be closed.
@n-sviridenko Is this still an issue? Please confirm within 14 days or this issue will be closed.