Fix: CypherTypeError: Property values can only be of primitive types or arrays thereof
What’s Fixed
neo4j.exceptions.CypherTypeError: {code: Neo.ClientError.Statement.TypeError} {message: Property values can only be of primitive types or arrays thereof. Encountered: Map{summary -> Map{description -> String("Summary containing the important information about the entity. Under 500 words"), title -> String("Summary"), type -> String("string"), value -> String("Fordham University is a university with Tania Tetlow as its president.")}}.}
This error occurs because Neo4j only accepts primitive types (or arrays of them) as property values. The code was inadvertently passing a nested map. Similar issues have been reported in #282 and #418.
Solution
- Added validation to ensure only primitive types are stored in node properties.
- Skipped setting properties if the value is not a primitive (to avoid runtime errors).
Background
While experimenting with examples/podcast/podcast_runner.py, I configured it to use:
- Ollama Embedding model (`nomic-embed-text:latest`)
- Grok API from x.ai (`grok-3-mini-latest`)
- Neo4J version 5.26.5
I also made a few tweaks to the graphiti library itself:
- Migrating to
uv(in progress; see #439) - Exposing additional packages for easier use (#438)
The error above consistently appeared after the process ran for a bit. Interestingly, the LLM and embedding models were working fine and had processed several nodes before hitting this issue. After applying the fix in this PR, the error disappeared completely.
Example (for Repro/Testing)
Here’s an example of the code I was running that consistently triggered the issue before the fix. It's a modification of the podcast_runner.py. Note that the example below used changes from #438, which exposed OpenAIGenericClient to make it easier to import. Also it used uv, but I'm still working on the PR because of #439 and #441.
import asyncio
import logging
import os
import sys
from uuid import uuid4
from dotenv import load_dotenv
from loguru import logger
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
from transcript_parser import parse_podcast_messages
from graphiti_core import Graphiti
from graphiti_core.cross_encoder.openai_reranker_client import OpenAIRerankerClient
from graphiti_core.embedder.openai import OpenAIEmbedder, OpenAIEmbedderConfig
from graphiti_core.llm_client import LLMConfig, OpenAIGenericClient
from graphiti_core.utils.maintenance.graph_data_operations import clear_data
load_dotenv()
neo4j_uri = os.environ.get('NEO4J_URI') or 'bolt://localhost:7687'
neo4j_user = os.environ.get('NEO4J_USER') or 'neo4j'
neo4j_password = os.environ.get('NEO4J_PASSWORD') or 'password'
def init_graphiti():
logger.info('Initializing Graphiti...')
# Azure OpenAI configuration
# Create Azure OpenAI client for LLM used in Graphiti
custom_openai_client = AsyncOpenAI(
api_key='<xai-api-key>',
base_url='https://api.x.ai/v1',
)
llm_config = LLMConfig(model='grok-3-mini-latest')
embedding_client = AsyncOpenAI(
api_key='who-cares',
base_url='http://localhost:11434/v1',
)
# Initialize Graphiti with Azure OpenAI clients
graphiti = Graphiti(
uri='bolt://localhost:7687',
user='neo4j',
password='password',
llm_client=OpenAIGenericClient(config=llm_config, client=custom_openai_client),
embedder=OpenAIEmbedder(
config=OpenAIEmbedderConfig(
embedding_dim=768, embedding_model='nomic-embed-text:latest'
),
client=embedding_client,
),
# Optional: Configure the OpenAI cross encoder with Azure OpenAI
cross_encoder=OpenAIRerankerClient(client=embedding_client),
)
logger.info('Graphiti initialized successfully.')
# Initialize the graph database with Graphiti's indices. This only needs to be done once.
return graphiti
def setup_logging():
# Create a logger
logger = logging.getLogger()
logger.setLevel(logging.INFO) # Set the logging level to INFO
# Create console handler and set level to INFO
console_handler = logging.StreamHandler(sys.stdout)
console_handler.setLevel(logging.INFO)
# Create formatter
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
# Add formatter to console handler
console_handler.setFormatter(formatter)
# Add console handler to logger
logger.addHandler(console_handler)
return logger
class Person(BaseModel):
"""A human person, fictional or nonfictional."""
first_name: str | None = Field(..., description='First name')
last_name: str | None = Field(..., description='Last name')
occupation: str | None = Field(..., description="The person's work occupation")
async def main():
setup_logging()
# client = Graphiti(neo4j_uri, neo4j_user, neo4j_password)
client = init_graphiti()
await clear_data(client.driver)
await client.build_indices_and_constraints()
messages = parse_podcast_messages()
group_id = str(uuid4())
for i, message in enumerate(messages[3:14]):
episodes = await client.retrieve_episodes(
message.actual_timestamp, 3, group_ids=['podcast']
)
episode_uuids = [episode.uuid for episode in episodes]
await client.add_episode(
name=f'Message {i}',
episode_body=f'{message.speaker_name} ({message.role}): {message.content}',
reference_time=message.actual_timestamp,
source_description='Podcast Transcript',
group_id=group_id,
entity_types={'Person': Person},
previous_episode_uuids=episode_uuids,
)
asyncio.run(main())
Notes
Sorry for not opening a new issue for discussion, because there were two related issues left unanswered.
If you have concerns about the fix or if there’s a preferred alternative, feel free to suggest changes or close this PR. I’m happy to iterate!
[!IMPORTANT] Fixes
CypherTypeErrorby ensuring only primitive types are stored as Neo4j node properties inbulk_utils.py.
- Behavior:
- Fixes
CypherTypeErrorby ensuring only primitive types or arrays are stored as Neo4j node properties inadd_nodes_and_edges_bulk_tx().- Skips setting non-primitive properties and logs debug messages for skipped attributes.
- Validation:
- Adds validation for primitive types in
bulk_utils.py.- Handles lists of primitives and logs non-primitive lists, dictionaries, and other types.
- Misc:
- Related issues: #282, #418.
- Background context provided for
examples/podcast/podcast_runner.py.This description was created by
for 10a9658915a912a84ae06167e2c2599df71654f5. You can customize this summary. It will automatically update as commits are pushed.
Fixed. It now tries to do type conversion so we can preserve more information. Let me know if there are any other problems.
Hi @maintainers! 👋
This PR looks like it might address a critical issue I'm experiencing in #683. I'm getting Neo4j TypeErrors when the LLM generates nested attribute structures:
Current Error
{code: Neo.ClientError.Statement.TypeError}
{message: Property values can only be of primitive types or arrays thereof.
Encountered: Map{utente -> String("789841665"), firstName -> String("John"),
lastName -> String("Doe"), emailAddress -> String("[email protected]"),
phoneNumber -> String("+351 912 123 456"), niss -> String("12154610487"),
nif -> String("312514328")}.}
Root Cause Analysis
The issue appears to be architectural rather than LLM-related. When processing structured content (personal data, technical specs, etc.), the LLM correctly generates logical nested objects:
# What LLM generates (correctly structured)
{
"person": {
"utente": "789841665",
"firstName": "John",
"contact": {
"email": "[email protected]",
"phone": "+351 912 123 456"
}
}
}
# What Neo4j expects (flattened primitives)
{
"person_utente": "789841665",
"person_firstName": "John",
"person_contact_email": "[email protected]",
"person_contact_phone": "+351 912 123 456"
}
Questions about this PR
-
Does this PR handle automatic flattening of nested attribute structures before Neo4j storage?
-
Are both EntityNode and EntityEdge attributes covered by the fix?
-
Will this work with custom entity types that have complex attribute schemas?
Reproduction Context
- Environment: Ollama (Qwen3 models) + Neo4j 5.26+
- Frequency: ~10-30% of episodes with structured content
- Triggers: Personal data, JSON episodes, technical specifications
Related Issues
This seems related to #282 (marked resolved but still occurring) and potentially contradicts #418 (which suggests the issue is with "smaller models" - but the nested structures I'm seeing are actually very well-formed and logical).
Thanks for working on this! 🙏