Bulk upload fails with NodeResolutions ValidationError: 'duplicates' field missing during entity resolution.
Bug Description
When performing a bulk upload using await graphiti.add_episode_bulk(bulk_episodes), the process fails with a ValidationError for NodeResolutions. The error indicates that the duplicates field is missing in the entity_resolutions returned by the LLM response.
Steps to Reproduce
Provide a minimal code example that reproduces the issue:
import asyncio
import json
import os
from datetime import datetime, timezone
from dotenv import load_dotenv
from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType
from graphiti_core.utils.maintenance.graph_data_operations import clear_data
from graphiti_core.utils.bulk_utils import RawEpisode
from graphiti_core.llm_client.config import LLMConfig
from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
load_dotenv()
neo4j_uri = os.environ.get('NEO4J_URI', 'bolt://localhost:7687')
neo4j_user = os.environ.get('NEO4J_USER', 'neo4j')
neo4j_password = os.environ.get('NEO4J_PASSWORD', 'password')
user_data = [
{
"name": "Rebecca Brown",
"email": "[email protected]",
"signup_date": "2025-08-20",
"subscription_plan": "Standard",
"activity_score": 32,
"last_login": "2025-08-21"
},
{
"name": "Kimberly Vazquez",
"email": "[email protected]",
"signup_date": "2025-08-22",
"subscription_plan": "Enterprise",
"activity_score": 19,
"last_login": "2025-08-25"
}
]
def stringify_ints(data):
if isinstance(data, dict):
return {k: stringify_ints(v) for k, v in data.items()}
elif isinstance(data, list):
return [stringify_ints(v) for v in data]
elif isinstance(data, int):
return str(data)
return data
async def bulk_upload():
llm_config = LLMConfig(
api_key=os.getenv("OPENAI_API_KEY", "password"),
model="gpt-4o-mini",
base_url="https://api.openai.com/v1",
small_model="gpt-4o-mini",
)
graphiti = Graphiti(
neo4j_uri, neo4j_user, neo4j_password,
llm_client=OpenAIGenericClient(config=llm_config),
)
try:
await graphiti.build_indices_and_constraints()
await clear_data(graphiti.driver)
bulk_episodes = [
RawEpisode(
name=f"User Data - {user['name']}",
content=json.dumps(stringify_ints(user)),
source=EpisodeType.json,
source_description="User metadata bulk upload",
reference_time=datetime.now(timezone.utc)
)
for user in user_data
]
await graphiti.add_episode_bulk(bulk_episodes)
print(f"✅ Successfully uploaded {len(bulk_episodes)} episodes.")
finally:
await graphiti.close()
if __name__ == "__main__":
asyncio.run(bulk_upload())
Expected Behavior
The bulk upload should complete successfully, storing all user records without validation errors.
Actual Behavior
The process fails during node resolution with the following error: pydantic_core._pydantic_core.ValidationError: 4 validation errors for NodeResolutions entity_resolutions.0.duplicates Field required [type=missing, input_value={'id': 0, 'name': 'Kimber...z', 'duplicate_idx': -1}, input_type=dict] entity_resolutions.1.duplicates Field required [type=missing, input_value={'id': 1, 'name': 'solisa...z', 'duplicate_idx': -1}, input_type=dict] entity_resolutions.2.duplicates Field required [type=missing, input_value={'id': 2, 'name': 'Enterp...se', 'duplicate_idx': 1}, input_type=dict] entity_resolutions.3.duplicates Field required [type=missing, input_value={'id': 3, 'name': '19', 'duplicate_idx': 33}, input_type=dict] RuntimeWarning: coroutine 'resolve_extracted_nodes' was never awaited RuntimeWarning: coroutine 'node_search' was never awaited RuntimeWarning: coroutine 'episode_search' was never awaited RuntimeWarning: coroutine 'community_search' was never awaited RuntimeWarning: coroutine 'edge_search' was never awaited
Environment
- Graphiti Version: [0.18.9]
- Python Version: [3.12.6]
- Operating System: [Windows]
- Database Backend: [Neo4j]
- LLM Provider & Model: [e.g. OpenAI gpt-4.o.mini]
Installation Method
- [x] pip install
Error Messages/Traceback
Traceback (most recent call last):
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\bulk_episode.py", line 984, in <module>
asyncio.run(bulk_upload())
File "C:\Users\Farman\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "C:\Users\Farman\AppData\Local\Programs\Python\Python312\Lib\asyncio\runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Farman\AppData\Local\Programs\Python\Python312\Lib\asyncio\base_events.py", line 687, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\bulk_episode.py", line 976, in bulk_upload
await graphiti.add_episode_bulk(bulk_episodes)
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\env\Lib\site-packages\graphiti_core\graphiti.py", line 853, in add_episode_bulk
raise e
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\env\Lib\site-packages\graphiti_core\graphiti.py", line 680, in add_episode_bulk
nodes_by_episode, uuid_map = await dedupe_nodes_bulk(
^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\env\Lib\site-packages\graphiti_core\utils\bulk_utils.py", line 249, in dedupe_nodes_bulk
] = await semaphore_gather(
^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\env\Lib\site-packages\graphiti_core\helpers.py", line 121, in semaphore_gather
return await asyncio.gather(*(_wrap_coroutine(coroutine) for coroutine in coroutines))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\env\Lib\site-packages\graphiti_core\helpers.py", line 119, in _wrap_coroutine
return await coroutine
^^^^^^^^^^^^^^^
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\env\Lib\site-packages\graphiti_core\utils\maintenance\node_operations.py", line 263, in resolve_extracted_nodes
node_resolutions: list[NodeDuplicate] = NodeResolutions(**llm_response).entity_resolutions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Farman\Downloads\ottomator-agents-main\ottomator-agents-main\graphiti-agent\env\Lib\site-packages\pydantic\main.py", line 253, in __init__
validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pydantic_core._pydantic_core.ValidationError: 4 validation errors for NodeResolutions
entity_resolutions.0.duplicates
Field required [type=missing, input_value={'id': 0, 'name': 'Kimber...z', 'duplicate_idx': -1}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.11/v/missing
entity_resolutions.1.duplicates
Field required [type=missing, input_value={'id': 1, 'name': 'solisa...z', 'duplicate_idx': -1}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.11/v/missing
entity_resolutions.2.duplicates
Field required [type=missing, input_value={'id': 2, 'name': 'Enterp...se', 'duplicate_idx': 1}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.11/v/missing
entity_resolutions.3.duplicates
Field required [type=missing, input_value={'id': 3, 'name': '19', 'duplicate_idx': 33}, input_type=dict]
For further information visit https://errors.pydantic.dev/2.11/v/missing
sys:1: RuntimeWarning: coroutine 'resolve_extracted_nodes' was never awaited
sys:1: RuntimeWarning: coroutine 'node_search' was never awaited
sys:1: RuntimeWarning: coroutine 'episode_search' was never awaited
sys:1: RuntimeWarning: coroutine 'community_search' was never awaited
sys:1: RuntimeWarning: coroutine 'edge_search' was never awaited
Configuration
llm_config = LLMConfig(
api_key=os.getenv("OPENAI_API_KEY", "password"),
model="gpt-4o-mini",
base_url="https://api.openai.com/v1",
small_model="gpt-4o-mini",
)
Additional Context
Additional Context The issue happens consistently with different datasets. No recent changes were made to the environment before this issue appeared. Component used: core library.
Possible Solution
The error suggests that the duplicates field is required in the entity_resolutions but is not being provided or populated. Could this be an issue with the response parsing from the LLM or a bug in the schema validation logic?
I encountered validation errors in the deduplication response from the LLM during node deduplication as well. I was using gpt4.1-mini when it happened to me, and my input content consisted of csv inside json, so many escape characters. For me, the issue was that the response was missing a closing quotation mark around json string fields.
I'm not sure exactly where the problem lies, but either the dedupe prompt is too confusing to the LLM.. or gpt-4 series do a bad job at it? Either way, there is retry logic built in - but I found that 1. The LLM wasn't able to fix its own errors during retry and 2. Something causes each retry attempt to take 7 minutes.. which is the bigger problem.
I found that switching to gpt-5 fixed the LLM response, but the prompt and retry logic should be looked at.
@farman-mk Is this still an issue? Please confirm within 14 days or this issue will be closed.
@farman-mk Is this still an issue? Please confirm within 14 days or this issue will be closed.