[BUG] Add Episodes function throws Invalid JSON and index out of range errors
Bug Description
We have a corpus of text that we sectionized and treat each section as an episode. We use add_episodes function to add the sections/episodes one by one into the graph. The process stalls with JSON invalidation and "index out of range" errors. We are sure that we feed episodes text that is correct json format and the token length is not more than max tokens the models can input.
Steps to Reproduce
- Set-up the client
- Prepare data
- Add episodes
from datetime import datetime, timezone
import json
from time import sleep
from graphiti_core.nodes import EpisodeType
failed_chunks = {}
episodes = {}
for i, episode in enumerate(sections_with_stages):
chunk = {
"condition": episode['condition'],
"title": episode['chunk']['title'],
"content": episode['chunk']['content'],
"Relevant stage": f"{episode['most_relevant_stage']} stage",
"Second relevant stage": f"{episode['secondary_relevant_stage']} stage",
}
episode_name = f'{episode["condition"]}-guideline-section-{i}'
episodes[episode_name] = chunk
for name, episode in episodes.items():
try:
await client.add_episode(
name=name,
episode_body=json.dumps(episode),
source_description=f"Title - {episode['title']}",
source=EpisodeType.json,
entity_types=entity_types,
edge_types=edge_types,
edge_type_map=edge_type_map,
reference_time=datetime.now(),
group_id='guidelines', # Group id is a namespace for the data
)
except Exception as e:
print("ERROR")
print(f"Error adding episode {name}: {e}")
# Add chunk data to a dict with i as the key if error is thrown
failed_chunks[i] = chunk
continue
print(f'Added episode:{name} ({EpisodeType.text.value})')
Expected Behavior
I would expect the process to run and walk through all episodes extracting nodes and edges without stalling or errors.
Actual Behavior
index out of range breaks the code, the other invalid JSON keeps retrying (stalls) but finishes.
Environment
- Graphiti Version: 0.18.9
- Python Version: 3.12
- Operating System: macOS 15.6.1,
- Database Backend: Neo4j Desktop 2.0.3
- LLM Provider & Model: openAI gpt-4.1-mini and gpt-4o-mini (small_model).
Installation Method
- [ ] pip install
Error Messages/Traceback
graphiti_core.llm_client.openai_base_client - ERROR - Error in generating LLM response: 1 validation error for NodeResolutions
Invalid JSON: EOF while parsing a string at line 1 column 33267 [type=json_invalid, input_value='{"entity_resolutions":[{...}]}]}]}]}]}]}]}]}]}]}]}', input_type=str]
For further information visit https://errors.pydantic.dev/2.11/v/json_invalid
OR
index out of range
Configuration
# Configure Graphiti
from graphiti_core import Graphiti
from graphiti_core.edges import EntityEdge
from graphiti_core.nodes import EpisodeType
from graphiti_core.utils.maintenance.graph_data_operations import clear_data
import os
neo4j_uri = os.environ['NEO4J_URI']
neo4j_user = os.environ['NEO4J_USER']
neo4j_password = os.environ['NEO4J_PASSWORD']
DEFAULT_DATABASE = os.environ['DEFAULT_DATABASE']
client = Graphiti(
neo4j_uri,
neo4j_user,
neo4j_password,
)
client.driver._database = DEFAULT_DATABASE
client.llm_client.model = "gpt-4.1-mini"
client.llm_client.small_model = "gpt-4o-mini"
client.llm_client.max_tokens = 16000
print(client.llm_client.model, client.driver._database)
Additional Context
- This happens constantly.
Possible Solution
Not sure. JSON invalidation might be structured output issue in the intermediary steps. I do not have an idea regarding the "Index problem"
I'm hitting the same problem. I think it's because the openai implementation is json json_object, rather than json_schema, where the latter actually forces the right schema, with the former you just indicate to the model that we expect json as output, and that the model should write that. However, there is no constraint on returning incorrect json.
Edit: hmm, this seems to be only with the GenericOpenAIClient, which is what i'm using and where i see this problem. You don't seem to specify which one you're using, could you add that information?
We used out of the box OpenAI client that is default.
do you hit this problem consistently? Or is it sometimes?
I constantly hit the problem when we add a large corpus of documents with hundreds of episodes. JSON invalidation does not break the code but index error does.
I used gpt-oss-20b, the following error popped constantly, seems the gpt-oss-20b cant do the structured output
pydantic_core._pydantic_core.ValidationError: 1 validation error for ExtractedEdges edges Field required [type=missing, input_value={'$defs': {'Edge': {'prop...dges', 'type': 'object'}, input_type=dict] For further information visit https://errors.pydantic.dev/2.11/v/missing
@cengover Is this still an issue? Please confirm within 14 days or this issue will be closed.
@cengover Is this still an issue? Please confirm within 14 days or this issue will be closed.