graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

Fix: CypherTypeError: Property values can only be of primitive types or arrays thereof

Open t41372 opened this issue 8 months ago • 1 comments

What’s Fixed

neo4j.exceptions.CypherTypeError: {code: Neo.ClientError.Statement.TypeError} {message: Property values can only be of primitive types or arrays thereof. Encountered: Map{summary -> Map{description -> String("Summary containing the important information about the entity. Under 500 words"), title -> String("Summary"), type -> String("string"), value -> String("Fordham University is a university with Tania Tetlow as its president.")}}.}

This error occurs because Neo4j only accepts primitive types (or arrays of them) as property values. The code was inadvertently passing a nested map. Similar issues have been reported in #282 and #418.


Solution

  • Added validation to ensure only primitive types are stored in node properties.
  • Skipped setting properties if the value is not a primitive (to avoid runtime errors).

Background

While experimenting with examples/podcast/podcast_runner.py, I configured it to use:

  • Ollama Embedding model (`nomic-embed-text:latest`)
  • Grok API from x.ai (`grok-3-mini-latest`)
  • Neo4J version 5.26.5

I also made a few tweaks to the graphiti library itself:

  • Migrating to uv (in progress; see #439)
  • Exposing additional packages for easier use (#438)

The error above consistently appeared after the process ran for a bit. Interestingly, the LLM and embedding models were working fine and had processed several nodes before hitting this issue. After applying the fix in this PR, the error disappeared completely.


Example (for Repro/Testing)

Here’s an example of the code I was running that consistently triggered the issue before the fix. It's a modification of the podcast_runner.py. Note that the example below used changes from #438, which exposed OpenAIGenericClient to make it easier to import. Also it used uv, but I'm still working on the PR because of #439 and #441.

import asyncio
import logging
import os
import sys
from uuid import uuid4

from dotenv import load_dotenv
from loguru import logger
from openai import AsyncOpenAI
from pydantic import BaseModel, Field
from transcript_parser import parse_podcast_messages

from graphiti_core import Graphiti
from graphiti_core.cross_encoder.openai_reranker_client import OpenAIRerankerClient
from graphiti_core.embedder.openai import OpenAIEmbedder, OpenAIEmbedderConfig
from graphiti_core.llm_client import LLMConfig, OpenAIGenericClient
from graphiti_core.utils.maintenance.graph_data_operations import clear_data

load_dotenv()

neo4j_uri = os.environ.get('NEO4J_URI') or 'bolt://localhost:7687'
neo4j_user = os.environ.get('NEO4J_USER') or 'neo4j'
neo4j_password = os.environ.get('NEO4J_PASSWORD') or 'password'


def init_graphiti():
    logger.info('Initializing Graphiti...')
    # Azure OpenAI configuration

    # Create Azure OpenAI client for LLM used in Graphiti
    custom_openai_client = AsyncOpenAI(
        api_key='<xai-api-key>',
        base_url='https://api.x.ai/v1',
    )
    llm_config = LLMConfig(model='grok-3-mini-latest')

    embedding_client = AsyncOpenAI(
        api_key='who-cares',
        base_url='http://localhost:11434/v1',
    )

    # Initialize Graphiti with Azure OpenAI clients
    graphiti = Graphiti(
        uri='bolt://localhost:7687',
        user='neo4j',
        password='password',
        llm_client=OpenAIGenericClient(config=llm_config, client=custom_openai_client),
        embedder=OpenAIEmbedder(
            config=OpenAIEmbedderConfig(
                embedding_dim=768, embedding_model='nomic-embed-text:latest'
            ),
            client=embedding_client,
        ),
        # Optional: Configure the OpenAI cross encoder with Azure OpenAI
        cross_encoder=OpenAIRerankerClient(client=embedding_client),
    )

    logger.info('Graphiti initialized successfully.')
    # Initialize the graph database with Graphiti's indices. This only needs to be done once.

    return graphiti


def setup_logging():
    # Create a logger
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)  # Set the logging level to INFO

    # Create console handler and set level to INFO
    console_handler = logging.StreamHandler(sys.stdout)
    console_handler.setLevel(logging.INFO)

    # Create formatter
    formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')

    # Add formatter to console handler
    console_handler.setFormatter(formatter)

    # Add console handler to logger
    logger.addHandler(console_handler)

    return logger


class Person(BaseModel):
    """A human person, fictional or nonfictional."""

    first_name: str | None = Field(..., description='First name')
    last_name: str | None = Field(..., description='Last name')
    occupation: str | None = Field(..., description="The person's work occupation")


async def main():
    setup_logging()
    # client = Graphiti(neo4j_uri, neo4j_user, neo4j_password)
    client = init_graphiti()
    await clear_data(client.driver)
    await client.build_indices_and_constraints()
    messages = parse_podcast_messages()
    group_id = str(uuid4())

    for i, message in enumerate(messages[3:14]):
        episodes = await client.retrieve_episodes(
            message.actual_timestamp, 3, group_ids=['podcast']
        )
        episode_uuids = [episode.uuid for episode in episodes]

        await client.add_episode(
            name=f'Message {i}',
            episode_body=f'{message.speaker_name} ({message.role}): {message.content}',
            reference_time=message.actual_timestamp,
            source_description='Podcast Transcript',
            group_id=group_id,
            entity_types={'Person': Person},
            previous_episode_uuids=episode_uuids,
        )


asyncio.run(main())

Notes

Sorry for not opening a new issue for discussion, because there were two related issues left unanswered.

If you have concerns about the fix or if there’s a preferred alternative, feel free to suggest changes or close this PR. I’m happy to iterate!


[!IMPORTANT] Fixes CypherTypeError by ensuring only primitive types are stored as Neo4j node properties in bulk_utils.py.

  • Behavior:
    • Fixes CypherTypeError by ensuring only primitive types or arrays are stored as Neo4j node properties in add_nodes_and_edges_bulk_tx().
    • Skips setting non-primitive properties and logs debug messages for skipped attributes.
  • Validation:
    • Adds validation for primitive types in bulk_utils.py.
    • Handles lists of primitives and logs non-primitive lists, dictionaries, and other types.
  • Misc:
    • Related issues: #282, #418.
    • Background context provided for examples/podcast/podcast_runner.py.

This description was created by Ellipsis for 10a9658915a912a84ae06167e2c2599df71654f5. You can customize this summary. It will automatically update as commits are pushed.

t41372 avatar May 04 '25 08:05 t41372

Fixed. It now tries to do type conversion so we can preserve more information. Let me know if there are any other problems.

t41372 avatar May 07 '25 22:05 t41372

Hi @maintainers! 👋

This PR looks like it might address a critical issue I'm experiencing in #683. I'm getting Neo4j TypeErrors when the LLM generates nested attribute structures:

Current Error

{code: Neo.ClientError.Statement.TypeError} 
{message: Property values can only be of primitive types or arrays thereof. 
Encountered: Map{utente -> String("789841665"), firstName -> String("John"), 
lastName -> String("Doe"), emailAddress -> String("[email protected]"), 
phoneNumber -> String("+351 912 123 456"), niss -> String("12154610487"), 
nif -> String("312514328")}.}

Root Cause Analysis

The issue appears to be architectural rather than LLM-related. When processing structured content (personal data, technical specs, etc.), the LLM correctly generates logical nested objects:

# What LLM generates (correctly structured)
{
    "person": {
        "utente": "789841665",
        "firstName": "John",
        "contact": {
            "email": "[email protected]",
            "phone": "+351 912 123 456"
        }
    }
}

# What Neo4j expects (flattened primitives)
{
    "person_utente": "789841665",
    "person_firstName": "John", 
    "person_contact_email": "[email protected]",
    "person_contact_phone": "+351 912 123 456"
}

Questions about this PR

  1. Does this PR handle automatic flattening of nested attribute structures before Neo4j storage?

  2. Are both EntityNode and EntityEdge attributes covered by the fix?

  3. Will this work with custom entity types that have complex attribute schemas?

Reproduction Context

  • Environment: Ollama (Qwen3 models) + Neo4j 5.26+
  • Frequency: ~10-30% of episodes with structured content
  • Triggers: Personal data, JSON episodes, technical specifications

Related Issues

This seems related to #282 (marked resolved but still occurring) and potentially contradicts #418 (which suggests the issue is with "smaller models" - but the nested structures I'm seeing are actually very well-formed and logical).

Thanks for working on this! 🙏

Sing303 avatar Jul 06 '25 14:07 Sing303