graphiti icon indicating copy to clipboard operation
graphiti copied to clipboard

[BUG] OOM Crash: `build_communities` Executes N Separate Queries Causing Process Death on Large Graphs

Open bek91 opened this issue 3 months ago • 1 comments

Bug Description

The get_community_clusters() function in community_operations.py executes N separate database queries (where N = number of entities in a group). On graphs with 1000+ entities, this causes the application process to be killed by the OS OOM killer with no Python exception or error logs.

The function iterates over every entity node and queries its neighbors individually, rather than fetching all neighbor relationships in a single batched query.

Steps to Reproduce

  1. Create a graph with 1000+ entities in a single group, all with group_id set
  2. Ensure entities have RELATES_TO relationships between them
  3. Call build_communities():
from graphiti_core import Graphiti

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
    llm_client=llm_client
)

# This will crash the process on large graphs
await graphiti.build_communities(group_ids=["your-group-id"])

Expected Behavior

  • build_communities() should successfully complete and create community nodes/edges
  • The process should remain stable regardless of graph size
  • Memory usage should scale reasonably with graph size

Actual Behavior

  • On graphs with 1000+ entities, the process is killed immediately by the OS (OOM killer)
  • No Python exception is raised - the entire process dies
  • No error logs appear - the crash happens before Python can log anything
  • FastAPI/web servers crash completely with no traceback
  • System logs show OOM killer terminating the process

Example diagnostic query showing the scale:

MATCH (e:Entity)
WHERE e.group_id IS NOT NULL
RETURN count(e) as entity_count
// Result: 1,210 entities → 1,210 separate database queries

Environment

  • Graphiti Version: [Latest from main branch - issue exists in community_operations.py]
  • Python Version: 3.11+
  • Operating System: macOS
  • Database Backend: Neo4j 5.x
  • LLM Provider & Model: OpenAI GPT-5

Installation Method

  • [x] pip install

Error Messages/Traceback

No Python traceback - process is killed by OS before exception handling.

System logs (Linux):

kernel: Out of memory: Killed process [PID] (python) total-vm:XXXXMB, anon-rss:XXXXMB

Configuration

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j", 
    password="password",
    llm_client=OpenAIClient(...)
)

# Graph has 1,210 entities with group_id set
# Crash occurs during build_communities call

Additional Context

  • Happens consistently on graphs with 1000+ entities
  • Using core library directly
  • The issue is in graphiti_core/utils/maintenance/community_operations.py at line ~32-60

Root Cause

In get_community_clusters(), this loop executes one query per entity:

for node in nodes:  # If 1,210 entities, this runs 1,210 times
    records, _, _ = await driver.execute_query(
        match_query + """
        WITH count(e) AS count, m.uuid AS uuid
        RETURN uuid, count
        """,
        uuid=node.uuid,  # Separate query for each node
        group_id=group_id,
    )

This creates a "death by 1000 queries" scenario where:

  1. 1,210 async queries are launched simultaneously
  2. Each query allocates memory for results
  3. OS runs out of memory and kills the process
  4. No Python exception occurs - immediate process termination

Possible Solution

Replace the N-query loop with a single batched query that fetches all neighbor relationships at once:

# Instead of querying each node individually, get all relationships in one query
match_query = """
    MATCH (n:Entity {group_id: $group_id})-[e:RELATES_TO]-(m:Entity {group_id: $group_id})
    WITH n.uuid AS source_uuid, m.uuid AS target_uuid, count(e) AS edge_count
    RETURN source_uuid, target_uuid, edge_count
"""

records, _, _ = await driver.execute_query(match_query, group_id=group_id)

# Build projection dictionary from batched results
projection: dict[str, list[Neighbor]] = {node.uuid: [] for node in nodes}

for record in records:
    source_uuid = record['source_uuid']
    target_uuid = record['target_uuid']
    edge_count = record['edge_count']
    
    if source_uuid in projection:
        projection[source_uuid].append(
            Neighbor(node_uuid=target_uuid, edge_count=edge_count)
        )

Benefits:

  • Single query to retrieve all relationships in scope.
  • Uses index on group_id for both sides of the relationship.
  • Performs only one roundtrip and one execution plan.
  • Linear data processing on the client side.

The fix is straightforward and maintains full backward compatibility of results.

bek91 avatar Oct 09 '25 02:10 bek91

@bek91 Is this still an issue? Please confirm within 14 days or this issue will be closed.

claude[bot] avatar Nov 17 '25 00:11 claude[bot]

@bek91 Is this still an issue? Please confirm within 14 days or this issue will be closed.

yes

bek91 avatar Dec 01 '25 06:12 bek91

I also encountered the same issue with large graphs. Currently I have to disable building the community node but apparently it misses the benefit of gathering high level information.

empyriumz avatar Dec 15 '25 21:12 empyriumz