graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

[BUG] Neo4j quickstart: search returns no results with OpenAI-compatible LLM + Ollama embeddings (no fact_embedding on RELATES_TO)

Open bailizii opened this issue 1 month ago • 2 comments

Bug Description

I tried setting up the Neo4j quickstart example with OpenAI Compatible Services instead of OpenAI (Qwen3 on DashScope for the LLM and Ollama for embeddings).

Episodes are ingested without errors and I can see RELATES_TO relationships and fact strings in Neo4j. However, graphiti.search("Who was the California Attorney General?") returns an empty list.

At the same time, Neo4j logs emit UnknownPropertyKeyWarning for fact_embedding and episodes in the queries that Graphiti generates, even after ingestion has completed.

Steps to Reproduce

Minimal code example (slightly simplified from the Quick Start demo):

import os
import json
import logging
from datetime import datetime, timezone
import asyncio

import nest_asyncio
from dotenv import load_dotenv

from graphiti_core.driver.neo4j_driver import Neo4jDriver
from graphiti_core import Graphiti
from graphiti_core.nodes import EpisodeType

from graphiti_core.llm_client import LLMConfig
from graphiti_core.llm_client.openai_generic_client import OpenAIGenericClient
from graphiti_core.embedder.openai import OpenAIEmbedder, OpenAIEmbedderConfig
from graphiti_core.cross_encoder.openai_reranker_client import OpenAIRerankerClient

nest_asyncio.apply()
load_dotenv()

logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
)
logger = logging.getLogger(__name__)

# Neo4j connection parameters
neo4j_uri = os.environ.get("NEO4J_URI", "bolt://localhost:7687")
neo4j_user = os.environ.get("NEO4J_USER", "neo4j")
neo4j_password = os.environ.get("NEO4J_PASSWORD", "password")

if not neo4j_uri or not neo4j_user or not neo4j_password:
    raise ValueError("NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD must be set")

# Explicit Neo4j driver (Neo4j Desktop DB "test5")
driver = Neo4jDriver(
    uri="neo4j://localhost:7687",
    user=neo4j_user,
    password=neo4j_password,
    database="test5",
)

# LLM config – DashScope OpenAI-compatible endpoint with Qwen3
llm_config = LLMConfig(
    api_key=os.environ.get("DASHCODE_KEY"),
    model="qwen3-max-2025-09-23",
    small_model="qwen3-max-2025-09-23",
    base_url="https://dashscope.aliyuncs.com/compatible-mode/v1",
    max_tokens=3000,
)

llm_client = OpenAIGenericClient(config=llm_config)

# Embedder – Ollama OpenAI-compatible embeddings endpoint
embedder = OpenAIEmbedder(
    config=OpenAIEmbedderConfig(
        api_key="ollama",  # placeholder, ignored by Ollama
        embedding_model="qwen3-embedding:8b",
        base_url="http://localhost:11434/v1",
    )
)

# Optional cross encoder
cross_encoder = OpenAIRerankerClient(client=llm_client, config=llm_config)

graphiti = Graphiti(
    graph_driver=driver,
    llm_client=llm_client,
    embedder=embedder,
    cross_encoder=cross_encoder,
)

# Episodes – same structure as the Quick Start
episodes = [
    {
        "content": (
            "Kamala Harris is the Attorney General of California. "
            "She was previously the district attorney for San Francisco."
        ),
        "type": EpisodeType.text,
        "description": "podcast transcript",
    },
    {
        "content": "As AG, Harris was in office from January 3, 2011 – January 3, 2017",
        "type": EpisodeType.text,
        "description": "podcast transcript",
    },
    {
        "content": {
            "name": "Gavin Newsom",
            "position": "Governor",
            "state": "California",
            "previous_role": "Lieutenant Governor",
            "previous_location": "San Francisco",
        },
        "type": EpisodeType.json,
        "description": "podcast metadata",
    },
    {
        "content": {
            "name": "Gavin Newsom",
            "position": "Governor",
            "term_start": "January 7, 2019",
            "term_end": "Present",
        },
        "type": EpisodeType.json,
        "description": "podcast metadata",
    },
]


async def search(query: str):
    print(f"\nSearching for: {query} ")
    results = await graphiti.search(query)

    print("\nSearch Results:")
    for result in results:
        print(f"UUID: {result.uuid}")
        print(f"Fact: {result.fact}")
        if hasattr(result, "valid_at") and result.valid_at:
            print(f"Valid from: {result.valid_at}")
        if hasattr(result, "invalid_at") and result.invalid_at:
            print(f"Valid until: {result.invalid_at}")
        print("---")


async def main():
    # 1) Initialize indices & constraints
    await graphiti.build_indices_and_constraints()

    # 2) Add episodes
    for i, episode in enumerate(episodes):
        await graphiti.add_episode(
            name=f"Freakonomics Radio {i}",
            episode_body=episode["content"]
            if isinstance(episode["content"], str)
            else json.dumps(episode["content"]),
            source=episode["type"],
            source_description=episode["description"],
            reference_time=datetime.now(timezone.utc),
        )
        print(f" Added episode: Freakonomics Radio {i} ({episode['type'].value})")

    # 3) Run the basic search from the docs
    await search("Who was the California Attorney General?")


if __name__ == "__main__":
    asyncio.run(main())

What I observe:

  1. build_indices_and_constraints() and add_episode() run without raising exceptions.
  2. In Neo4j, I can see RELATES_TO relationships and their fact strings.
  3. However, graphiti.search("Who was the California Attorney General?") prints an empty result list.
  4. Neo4j logs show repeated UnknownPropertyKeyWarning for fact_embedding and episodes in the generated queries.

Expected Behavior

I expected the basic hybrid search example to return at least one result with a fact about Kamala Harris being the Attorney General of California, similar to the Quick Start documentation.

In particular:

  • graphiti.search("Who was the California Attorney General?") should return one or more edges with a non-empty fact, and
  • the Neo4j queries should not be complaining that fact_embedding and episodes are unknown properties after ingestion has completed.

Actual Behavior

  • add_episode finishes successfully and logs something like:

    2025-11-19 10:58:19 - graphiti_core.graphiti - INFO - Completed add_episode in 9663.837432861328 ms
     Added episode: Freakonomics Radio 3 (json)
    
  • Neo4j contains RELATES_TO edges with a fact value, but fact_embedding and episodes are not present, and graphiti.search(...) returns an empty list.

Neo4j warnings during ingestion / search

Example warnings:

2025-11-19 10:58:16 - neo4j.notifications - WARNING - Received notification from DBMS server: {severity: WARNING} {code: Neo.ClientNotification.Statement.UnknownPropertyKeyWarning} {category: UNRECOGNIZED} {title: The provided property key is not in the database} {description: One of the property names in your query is not available in the database, make sure you didn't misspell it or that the label is available when you run this statement in your application (the missing property name is: fact_embedding)} {position: line: 5, column: 58, offset: 258} for query: '
                UNWIND $edges AS edge
                MATCH (n:Entity {uuid: edge.source_node_uuid})-[e:RELATES_TO {group_id: edge.group_id}]-(m:Entity {uuid: edge.target_node_uuid})
                
                WITH e, edge, vector.similarity.cosine(e.fact_embedding, edge.fact_embedding) AS score
                WHERE score > $min_score
                WITH edge, e, score
                ORDER BY score DESC
                RETURN
                    edge.uuid AS search_edge_uuid,
                    collect({
                        uuid: e.uuid,
                        source_node_uuid: startNode(e).uuid,
                        target_node_uuid: endNode(e).uuid,
                        created_at: e.created_at,
                        name: e.name,
                        group_id: e.group_id,
                        fact: e.fact,
                        fact_embedding: e.fact_embedding,
                        episodes: e.episodes,
                        expired_at: e.expired_at,
                        valid_at: e.valid_at,
                        invalid_at: e.invalid_at,
                        attributes: properties(e)
                    })[..$limit] AS matches
                '

and when calling graphiti.search(...):

2025-11-19 10:58:28 - httpx - INFO - HTTP Request: POST http://localhost:11434/v1/embeddings "HTTP/1.1 200 OK"
2025-11-19 10:58:28 - neo4j.notifications - WARNING - ... (missing property name is: episodes) ... for query: 'CALL db.index.fulltext.queryRelationships("edge_name_and_fact", $query, {limit: $limit}) ...
2025-11-19 10:58:28 - neo4j.notifications - WARNING - ... (missing property name is: fact_embedding) ... for query: '
        MATCH (n:Entity)-[e:RELATES_TO]->(m:Entity)
    
            WITH DISTINCT e, n, m, vector.similarity.cosine(e.fact_embedding, $search_vector) AS score
            WHERE score > $min_score
            RETURN
            ...

After this, the script prints:

Searching for: Who was the California Attorney General? 

Search Results:

with no entries.

Cypher inspection of the stored relationships

Running this Cypher:

MATCH ()-[e:RELATES_TO]->()
RETURN e.group_id AS gid, keys(e) AS keys, e.fact AS fact
LIMIT 5;

returns (via the Neo4j Python driver):

[
  {
    "keys": ["gid", "keys", "fact"],
    "length": 3,
    "_fields": [
      "",
      [],
      "Kamala Harris is the Attorney General of California."
    ],
    "_fieldLookup": {
      "gid": 0,
      "keys": 1,
      "fact": 2
    }
  },
  {
    "keys": ["gid", "keys", "fact"],
    "length": 3,
    "_fields": [
      "",
      [],
      "She was previously the district attorney for San Francisco."
    ],
    "_fieldLookup": {
      "gid": 0,
      "keys": 1,
      "fact": 2
    }
  },
  {
    "keys": ["gid", "keys", "fact"],
    "length": 3,
    "_fields": [
      "",
      [],
      "As AG, Harris was in office from January 3, 2011 – January 3, 2017"
    ],
    "_fieldLookup": {
      "gid": 0,
      "keys": 1,
      "fact": 2
    }
  },
  {
    "keys": ["gid", "keys", "fact"],
    "length": 3,
    "_fields": [
      "",
      [],
      "Gavin Newsom holds the position of Governor."
    ],
    "_fieldLookup": {
      "gid": 0,
      "keys": 1,
      "fact": 2
    }
  },
  {
    "keys": ["gid", "keys", "fact"],
    "length": 3,
    "_fields": [
      "",
      [],
      "Gavin Newsom is the Governor of California."
    ],
    "_fieldLookup": {
      "gid": 0,
      "keys": 1,
      "fact": 2
    }
  }
]

So for these RELATES_TO edges:

  • e.fact is present and contains the expected text.
  • e.group_id appears to be an empty string ("") for all of them.
  • keys(e) is an empty list [] (so properties like fact_embedding and episodes are definitely not there, and even fact / group_id don't show up via keys(e)).

Environment

  • Graphiti Version: graphiti-core==0.20.4

  • Python Version: 3.12.x (conda env)

  • Operating System: Linux x86_64 (running Neo4j Desktop)

  • Database Backend: Neo4j 5.x via Neo4j Desktop (local DB test5)

  • LLM Provider & Model:

    • LLM: Alibaba DashScope OpenAI-compatible endpoint

      • model="qwen3-max-2025-09-23" for both model and small_model
      • base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"
    • Embedder: Ollama OpenAI-compatible endpoint

      • embedding_model="qwen3-embedding:8b" at base_url="http://localhost:11434/v1"
    • Cross encoder: OpenAIRerankerClient using the same Qwen3 DashScope config

Installation Method

  • [x] pip install
  • [ ] uv add
  • [ ] Development installation (git clone)

(pip install graphiti-core==0.20.4 into a fresh conda env)

Error Messages/Traceback

No Python exception is raised. The only diagnostics are the Neo4j UnknownPropertyKeyWarning messages shown above for fact_embedding and episodes on the queries that Graphiti generates.

Configuration

Relevant configuration / initialization code (Neo4j driver, LLMConfig with OpenAIGenericClient, OpenAIEmbedder pointing at Ollama, OpenAIRerankerClient, and the sample episodes) is included in the Steps to Reproduce section.

Additional Context

  • The issue is consistent: every time I run the script, ingestion completes, RELATES_TO edges are created, but graphiti.search(...) returns zero results.

  • From the Cypher inspection, it looks like:

    • fact text is present on the edges,
    • group_id is an empty string, and
    • no fact_embedding (or episodes) properties exist on the relationships.
  • I’m not sure whether this is:

    • a limitation/bug when using OpenAIGenericClient with Qwen3 via DashScope’s OpenAI-compatible API,
    • or a problem with how the embeddings are being created/stored when using Ollama’s /v1/embeddings endpoint,
    • or something related to group_id being empty when searching.

Possible Solution

From a user perspective, it looks like the ingestion pipeline is not setting fact_embedding and episodes on the RELATES_TO relationships (and possibly leaving group_id empty) when using this combination of:

  • OpenAIGenericClient pointing to DashScope Qwen3, and
  • OpenAIEmbedder pointing to Ollama’s /v1/embeddings.

If this configuration is supposed to be supported, any pointers on:

  • validating the structured outputs from the LLM,
  • verifying that embeddings are being computed and written to e.fact_embedding, and
  • confirming the expected group_id behavior

would be very helpful. I’m happy to run additional Cypher queries, enable more debug logs, or try an alternative configuration if that helps narrow this down.

bailizii avatar Nov 19 '25 08:11 bailizii

Update: This turned out to be a Neo4j Desktop issue. After switching to a Docker-hosted Neo4j 5.26.16 (Community Edition) and pointing Graphiti at that instance, the fact_embedding / episodes properties were correctly written and graphiti.search() started returning results.

After seeing #109 (thanks to the person who shared that! 🙏), I switched to a Docker-hosted Neo4j 5.x Community Edition instance. With the same code and data:

  • RELATES_TO edges now have the expected properties
  • graphiti.search(...) returns results
  • the UnknownPropertyKeyWarning messages are gone

Suggestion It might be helpful to add a short note in the README recommending a regular Neo4j 5.x server (e.g. Docker Community/Enterprise) instead of Neo4j Desktop for the quickstart.

bailizii avatar Nov 20 '25 02:11 bailizii

@bailizii Graphiti should work with Neo4j Desktop. We use Desktop quite often at Zep. Do you see any errors in your applications logs?

danielchalef avatar Nov 21 '25 00:11 danielchalef