graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

[BUG] Hallucinations

Open gdornic opened this issue 5 months ago • 10 comments

Bug Description

Using Graphiti with the default models from openai leads sometimes to "Output length exceeded max tokens 8192" with something like "completion_tokens=8192, prompt_tokens=2825". So I checked the logs of openai, and it seems the model hallucinate.

Steps to Reproduce

await graphiti.add_episode(
            name=f'{category} - {id}',
            episode_body=fact,
            source=EpisodeType.text,
            source_description=f"Fact from {task['description']}",
            reference_time=datetime.now(timezone.utc)
        )

Where 'fact' is a paragraph of no more than 5 sentences.

Expected Behavior

No hallucination.

Actual Behavior

"Output length exceeded max tokens 8192" with something like "completion_tokens=8192, prompt_tokens=2825".

Environment

  • Graphiti Version: [e.g. 0.15.1]
  • Python Version: [e.g. 3.11.5]
  • Operating System: [Ubuntu 24.04]
  • Database Backend: [Neo4j 5.26]
  • LLM Provider & Model: [OpenAI gpt-4.1]

Installation Method

  • [x] pip install

Error Messages/Traceback

Retrying after application error (attempt 1/2): Output length exceeded max tokens 8192: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=8192, prompt_tokens=2825, total_tokens=11017, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

OpenAI logs (prompt + response): PROMPT: system You are a helpful assistant that determines whether or not ENTITIES extracted from a conversation are duplicatesof existing entities. Do not escape unicode characters. Any extracted information should be returned in the same language as it was written in. user <PREVIOUS MESSAGES> [ "Milaris Partners accompagne les dirigeants, actionnaires et investisseurs de PME et ETI dans la cession, l\u2019acquisition ou le financement d\u2019entreprises.\nMilaris Partners est un cabinet parisien au regard r\u00e9solument europ\u00e9en.\nMilaris Partners accompagne les TPE, PME et ETI dans leurs op\u00e9rations de cession ou de lev\u00e9e de fonds.\nMilaris Partners est pr\u00e9sent en France, Italie et en Allemagne.\nMilaris Partners est pionnier de l\u2019int\u00e9gration de l\u2019intelligence artificielle en M&A.\n", "Milaris Partners r\u00e9v\u00e8le la valeur cach\u00e9e dans le paysage fragment\u00e9 des fusions-acquisitions de petites et moyennes capitalisations.\nMilaris Partners identifie des acqu\u00e9reurs hors radars.\nMilaris Partners qualifie rapidement des cibles pertinentes.\nMilaris Partners r\u00e9duit significativement le temps de sourcing.\nMilaris Partners fournit une valorisation objective des cibles.\n", "Milaris Partners propose des solutions sur mesure pour les dirigeants d\u2019entreprises de 5 \u00e0 150M\u20ac de chiffre d\u2019affaires.\nMilaris Partners conseille en cession et accompagne les entreprises dans la transmission.\nEntreprises accompagn\u00e9es par Milaris Partners en cession ont un chiffre d\u2019affaires entre 5 et 150M\u20ac.\nLe processus structur\u00e9 propos\u00e9 par Milaris Partners en cession dure entre 6 et 9 mois.\nMilaris Partners conseille en acquisition et identifie les meilleures opportunit\u00e9s.\n", "Milaris Partners propose une n\u00e9gociation optimis\u00e9e en acquisition.\nMilaris Partners conseille en financement et structure le financement optimal.\nMilaris Partners propose une lev\u00e9e de fonds growth.\nMilaris Partners propose des financements hybrides.\nMilaris Partners poss\u00e8de un r\u00e9seau d\u2019investisseurs qualifi\u00e9s.\n", "Milaris Partners poss\u00e8de des experts natifs int\u00e9gr\u00e9s \u00e0 leurs \u00e9quipes.\nMilaris Partners est pr\u00e9sent dans les principales capitales europ\u00e9ennes.\nMilaris Partners poss\u00e8de une expertise sectorielle dans six secteurs cl\u00e9s.\nMilaris Partners poss\u00e8de une expertise approfondie dans le secteur des biens de consommation.\nMilaris Partners accompagne les entreprises innovantes dans leur transformation digitale.\n", "Milaris Partners propose un support strat\u00e9gique pour les entreprises industrielles et manufacturi\u00e8res.\nMilaris Partners poss\u00e8de une expertise dans l\u2019optimisation et la valorisation des services B2B.\nMilaris Partners accompagne dans la transition \u00e9nerg\u00e9tique et les projets durables.\nFranck Johanny est pr\u00e9sident directeur g\u00e9n\u00e9ral.\nFranck Johanny estime que Milaris Partners l\u2019a accompagn\u00e9 dans la cession de son entreprise avec un professionnalisme remarquable.\n" ] </PREVIOUS MESSAGES> <CURRENT MESSAGE> Milaris Partners accompagne les dirigeants, actionnaires et investisseurs de PME et ETI dans la cession, l’acquisition ou le financement d’entreprises. Milaris Partners est un cabinet parisien au regard résolument européen. Milaris Partners accompagne les TPE, PME et ETI dans leurs opérations de cession ou de levée de fonds. Milaris Partners est présent en France, Italie et en Allemagne. Milaris Partners est pionnier de l’intégration de l’intelligence artificielle en M&A. </CURRENT MESSAGE> Each of the following ENTITIES were extracted from the CURRENT MESSAGE. Each entity in ENTITIES is represented as a JSON object with the following structure: { id: integer id of the entity, name: "name of the entity", entity_type: "ontological classification of the entity", entity_type_description: "Description of what the entity type represents", duplication_candidates: [ { idx: integer index of the candidate entity, name: "name of the candidate entity", entity_type: "ontological classification of the candidate entity", ... } ] } <ENTITIES> [ { "id": 0, "name": "Milaris Partners", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 1, "name": "dirigeants", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 2, "name": "actionnaires", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 3, "name": "investisseurs de PME et ETI", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 4, "name": "PME", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 5, "name": "ETI", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 6, "name": "cabinet parisien", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 7, "name": "TPE", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 8, "name": "France", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 9, "name": "Italie", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 10, "name": "Allemagne", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" }, { "id": 11, "name": "intelligence artificielle en M&A", "entity_type": [ "Entity" ], "entity_type_description": "Default Entity Type" } ] </ENTITIES> <EXISTING ENTITIES> [ [ { "idx": 0, "name": "Milaris Partners", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 1, "name": "ETI", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 2, "name": "entreprises en cession avec un chiffre d\u2019affaires entre 5 et 150M\u20ac", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 3, "name": "dirigeants d\u2019entreprises de 5 \u00e0 150M\u20ac de chiffre d\u2019affaires", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 4, "name": "Franck Johanny", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 5, "name": "PME", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 6, "name": "TPE", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 7, "name": "entreprises innovantes", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 8, "name": "Allemagne", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 9, "name": "principales capitales europ\u00e9ennes", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 10, "name": "France", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 11, "name": "experts natifs", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 12, "name": "secteurs cl\u00e9s", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 13, "name": "transformation digitale", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 14, "name": "Italie", "entity_types": [ "Entity" ], "labels": [ "Entity" ] }, { "idx": 15, "name": "secteur des biens de consommation", "entity_types": [ "Entity" ], "labels": [ "Entity" ] } ] ] </EXISTING ENTITIES> For each of the above ENTITIES, determine if the entity is a duplicate of any of the EXISTING ENTITIES. Entities should only be considered duplicates if they refer to the same real-world object or concept. Do NOT mark entities as duplicates if: - They are related but distinct. - They have similar names or purposes but refer to separate instances or concepts. Task: Your response will be a list called entity_resolutions which contains one entry for each entity. For each entity, return the id of the entity as id, the name of the entity as name, and the duplicate_idx as an integer. - If an entity is a duplicate of one of the EXISTING ENTITIES, return the idx of the candidate it is a duplicate of. - If an entity is not a duplicate of one of the EXISTING ENTITIES, return the -1 as the duplication_idx

RESPONSE: {"entity_resolutions":[{"id":0,"duplicate_idx":0,"name":"Milaris Partners},{" ,"duplicates":[0]},{"id":1,"duplicate_idx":-1,"name":"dirigeants},{" ,"duplicates":[]},{"id":2,"duplicate_idx":-1,"name":"actionnaires},{" ,"duplicates":[]},{"id":3,"duplicate_idx":-1,"name":"investisseurs de PME et ETI},{" ,"duplicates":[]},{"id":4,"duplicate_idx":5,"name":"PME},{" ,"duplicates":[5]},{"id":5,"duplicate_idx":1,"name":"ETI},{" ,"duplicates":[1]},{"id":6,"duplicate_idx":-1,"name":"cabinet parisien},{" ,"duplicates":[]},{"id":7,"duplicate_idx":6,"name":"TPE},{" ,"duplicates":[6]},{"id":8,"duplicate_idx":10,"name":"France},{" ,"duplicates":[10]},{"id":9,"duplicate_idx":14,"name":"Italie},{" ,"duplicates":[14]},{"id":10,"duplicate_idx":8,"name":"Allemagne},{" ,"duplicates":[8]},{"id":11,"duplicate_idx":-1,"name":"intelligence artificielle en M&A}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]}]... (continue until 8192 tokens is reached)

Configuration

The default one, where I only provide the openai api key.

Additional Context

  • this happen consistently
  • component: core library
  • I encounter the same with add_episode_bulk()

Possible Solution

Maybe limit the prompt size ?

gdornic avatar Jul 24 '25 10:07 gdornic

If it helps, it often happens during duplicate checks.

Image

gdornic avatar Jul 24 '25 13:07 gdornic

gpt-4.1-nano is used as the default deduplication model. You could try using a larger model as the SMALL model when configuring Zep's LLMProvider.

danielchalef avatar Jul 27 '25 01:07 danielchalef

I checked the OpenAI logs, and I'm fairly certain that gpt-4.1-nano is only used in this specific context:

Image

I found a workaround by using Outlines as the backend for guided decoding with a local model via vLLM. However, I often encounter "index out of range" errors when Graphiti checks the answers for duplicate verification.

gdornic avatar Jul 28 '25 08:07 gdornic

This looks like a textbook case of semantic hallucination—your model is generating output that exceeds the expected token limit and produces invalid or unexpected completions. Based on your description and traceback, this closely matches Problem No.8 (“Concurrent Ingestion & State Drift”) on my diagnostic checklist.

The key failure mode here is that when state tracking or token boundaries go out of sync (often due to async/parallel queries or misaligned state management), LLM outputs can easily overrun buffers, hallucinate content, or fail parsing with cryptic errors.

If you need a minimal test or want to reproduce similar failures, just let me know—I've catalogued a lot of these failure modes in live systems, and the symptoms in your traceback are a near-perfect match.

onestardao avatar Aug 18 '25 11:08 onestardao

2025-08-14 14:12:20 - WARNING - Retrying after application error (attempt 1/2): Output length exceeded max tokens 16000: Could not parse response content as the length limit was reached - CompletionUsage(completion_tokens=16000, prompt_tokens=4662, total_tokens=20662, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

I am encountering the "Output length exceeded max tokens" error when trying to add documents. Despite increasing the token limit to 32,000 and reducing the text in each episode to just 200 characters, the problem persists. Can you please investigate this issue? Are there other limitations or settings that might be causing this, or is there an alternative approach we should try to ensure documents are uploaded successfully?

@danielchalef @onestardao

yashvrdhn1105-dev avatar Aug 19 '25 11:08 yashvrdhn1105-dev

this looks like Problem No.8 in our ProblemMap — Concurrent ingestion and state drift. quick use:

  1. set a conservative max_output_tokens on the model (e.g. 1000)
  2. preflight token count: if prompt + buffer > model_limit, shorten or split the request
  3. on parse failures retry with a smaller max_output_tokens and stricter stop sequence

full checklist here: https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

want me to paste a 3-step comment you can post directly to the issue?

onestardao avatar Aug 22 '25 10:08 onestardao

Hi, we went around this problem by setting EPISODE_WINDOW_LEN=1, but if it is hallucinations, then it is not guaranteed to work. May I ask why we pull last_n episodes and provide as context?

Right now we get these two errors while adding episodes (sections of texts):

  • graphiti_core.llm_client.openai_base_client - ERROR - Error in generating LLM response: 1 validation error for NodeResolutions Invalid JSON: EOF while parsing a string at line 1 column 76330 [type=json_invalid, input_value='{"entity_resolutions":[{...ype name type name type', input_type=str] For further information visit https://errors.pydantic.dev/2.11/v/json_invalid
  • list index out of range.

I make sure to have a valid json but I believe somewhere in the add_episodes function it fails to get structured output. It does not break the code though, it continues to add nodes and edges. "list index out of range" though breaks the code and i do not know how to check why it is generated. I will create separate issue for this.

cengover avatar Aug 27 '25 14:08 cengover

@cengover

this pattern isn’t just worker count. it’s a mix of No.11 — concurrent ingestion & state drift + No.1 — semantic hallucination when a NodeAddAction writes back without a contract check. the fix is a small “write-gate” in front of your graph writer.

minimal guard (drop-in):

  1. receipt_token + mutex per job attach receipt_token to the LLM task, acquire a short-lived mutex on the write path. prevents cross-job collisions when retries/timeouts happen.

  2. source-bound contract for every node/edge, require: source_doc_id, span_start/end, raw_text_checksum, and a vector+span match against the exact chunk. reject if sim < τ or span can’t be validated. this blocks “fabricated” entities.

  3. semantic firewall before write normalize embeddings, run the contract check, then dry-run the write. if the projected graph violates constraints, rollback and send to a quarantine queue instead of committing.

  4. bad API response quarantine when upstream responses are partial/late, do not promote to “valid write”. enqueue to a retry bucket with the same receipt_token so duplicates collapse.

full checklist with the numbered items is here: WFGY Problem Map → https://github.com/onestardao/WFGY/tree/main/ProblemMap/README.md

onestardao avatar Aug 28 '25 02:08 onestardao

@gdornic Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Oct 06 '25 00:10 claude[bot]

@gdornic Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Nov 17 '25 00:11 claude[bot]