graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

[BUG] Add Episodes function throws Invalid JSON and index out of range errors

Open cengover opened this issue 4 months ago • 7 comments

Bug Description

We have a corpus of text that we sectionized and treat each section as an episode. We use add_episodes function to add the sections/episodes one by one into the graph. The process stalls with JSON invalidation and "index out of range" errors. We are sure that we feed episodes text that is correct json format and the token length is not more than max tokens the models can input.

Steps to Reproduce

  • Set-up the client
  • Prepare data
  • Add episodes
from datetime import datetime, timezone
import json
from time import sleep
from graphiti_core.nodes import EpisodeType

failed_chunks = {}
episodes = {}
for i, episode in enumerate(sections_with_stages):
    chunk = {
        "condition": episode['condition'],
        "title": episode['chunk']['title'],
        "content": episode['chunk']['content'],
        "Relevant stage": f"{episode['most_relevant_stage']} stage",
        "Second relevant stage": f"{episode['secondary_relevant_stage']} stage",
    }
    episode_name = f'{episode["condition"]}-guideline-section-{i}'
    episodes[episode_name] = chunk

for name, episode in episodes.items():
    try:
        await client.add_episode(
            name=name,
            episode_body=json.dumps(episode),
            source_description=f"Title - {episode['title']}",
            source=EpisodeType.json,
            entity_types=entity_types,
            edge_types=edge_types,
            edge_type_map=edge_type_map,
            reference_time=datetime.now(),
            group_id='guidelines', # Group id is a namespace for the data
        )
    except Exception as e:
        print("ERROR")
        print(f"Error adding episode {name}: {e}")
        # Add chunk data to a dict with i as the key if error is thrown
        failed_chunks[i] = chunk
        continue
    print(f'Added episode:{name} ({EpisodeType.text.value})')

Expected Behavior

I would expect the process to run and walk through all episodes extracting nodes and edges without stalling or errors.

Actual Behavior

index out of range breaks the code, the other invalid JSON keeps retrying (stalls) but finishes.

Environment

  • Graphiti Version: 0.18.9
  • Python Version: 3.12
  • Operating System: macOS 15.6.1,
  • Database Backend: Neo4j Desktop 2.0.3
  • LLM Provider & Model: openAI gpt-4.1-mini and gpt-4o-mini (small_model).

Installation Method

  • [ ] pip install

Error Messages/Traceback

graphiti_core.llm_client.openai_base_client - ERROR - Error in generating LLM response: 1 validation error for NodeResolutions
  Invalid JSON: EOF while parsing a string at line 1 column 33267 [type=json_invalid, input_value='{"entity_resolutions":[{...}]}]}]}]}]}]}]}]}]}]}]}', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/json_invalid
OR 
index out of range

Configuration

# Configure Graphiti
from graphiti_core import Graphiti
from graphiti_core.edges import EntityEdge
from graphiti_core.nodes import EpisodeType
from graphiti_core.utils.maintenance.graph_data_operations import clear_data
import os

neo4j_uri = os.environ['NEO4J_URI']
neo4j_user = os.environ['NEO4J_USER']
neo4j_password = os.environ['NEO4J_PASSWORD']
DEFAULT_DATABASE = os.environ['DEFAULT_DATABASE']

client = Graphiti(
    neo4j_uri,
    neo4j_user,
    neo4j_password,
)
client.driver._database = DEFAULT_DATABASE
client.llm_client.model = "gpt-4.1-mini"
client.llm_client.small_model = "gpt-4o-mini"
client.llm_client.max_tokens = 16000
print(client.llm_client.model, client.driver._database)

Additional Context

  • This happens constantly.

Possible Solution

Not sure. JSON invalidation might be structured output issue in the intermediary steps. I do not have an idea regarding the "Index problem"

cengover avatar Aug 27 '25 15:08 cengover

I'm hitting the same problem. I think it's because the openai implementation is json json_object, rather than json_schema, where the latter actually forces the right schema, with the former you just indicate to the model that we expect json as output, and that the model should write that. However, there is no constraint on returning incorrect json.

Edit: hmm, this seems to be only with the GenericOpenAIClient, which is what i'm using and where i see this problem. You don't seem to specify which one you're using, could you add that information?

Baukebrenninkmeijer avatar Aug 29 '25 14:08 Baukebrenninkmeijer

We used out of the box OpenAI client that is default.

cengover avatar Sep 02 '25 13:09 cengover

do you hit this problem consistently? Or is it sometimes?

Baukebrenninkmeijer avatar Sep 02 '25 13:09 Baukebrenninkmeijer

I constantly hit the problem when we add a large corpus of documents with hundreds of episodes. JSON invalidation does not break the code but index error does.

cengover avatar Sep 02 '25 17:09 cengover

I used gpt-oss-20b, the following error popped constantly, seems the gpt-oss-20b cant do the structured output

pydantic_core._pydantic_core.ValidationError: 1 validation error for ExtractedEdges edges Field required [type=missing, input_value={'$defs': {'Edge': {'prop...dges', 'type': 'object'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.11/v/missing

keithwongintegrated avatar Sep 11 '25 09:09 keithwongintegrated

@cengover Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Nov 07 '25 00:11 claude[bot]

@cengover Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Nov 17 '25 00:11 claude[bot]