graphiti [BUG] OOM Crash: `build_communities` Executes N Separate Queries Causing Process Death on Large Graphs

Bug Description

The get_community_clusters() function in community_operations.py executes N separate database queries (where N = number of entities in a group). On graphs with 1000+ entities, this causes the application process to be killed by the OS OOM killer with no Python exception or error logs.

The function iterates over every entity node and queries its neighbors individually, rather than fetching all neighbor relationships in a single batched query.

Steps to Reproduce

Create a graph with 1000+ entities in a single group, all with group_id set
Ensure entities have RELATES_TO relationships between them
Call build_communities():

from graphiti_core import Graphiti

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j",
    password="password",
    llm_client=llm_client
)

# This will crash the process on large graphs
await graphiti.build_communities(group_ids=["your-group-id"])

Expected Behavior

build_communities() should successfully complete and create community nodes/edges
The process should remain stable regardless of graph size
Memory usage should scale reasonably with graph size

Actual Behavior

On graphs with 1000+ entities, the process is killed immediately by the OS (OOM killer)
No Python exception is raised - the entire process dies
No error logs appear - the crash happens before Python can log anything
FastAPI/web servers crash completely with no traceback
System logs show OOM killer terminating the process

Example diagnostic query showing the scale:

MATCH (e:Entity)
WHERE e.group_id IS NOT NULL
RETURN count(e) as entity_count
// Result: 1,210 entities → 1,210 separate database queries

Environment

Graphiti Version: [Latest from main branch - issue exists in community_operations.py]
Python Version: 3.11+
Operating System: macOS
Database Backend: Neo4j 5.x
LLM Provider & Model: OpenAI GPT-5

Installation Method

[x] pip install

Error Messages/Traceback

No Python traceback - process is killed by OS before exception handling.

System logs (Linux):

kernel: Out of memory: Killed process [PID] (python) total-vm:XXXXMB, anon-rss:XXXXMB

Configuration

graphiti = Graphiti(
    uri="bolt://localhost:7687",
    user="neo4j", 
    password="password",
    llm_client=OpenAIClient(...)
)

# Graph has 1,210 entities with group_id set
# Crash occurs during build_communities call

Additional Context

Happens consistently on graphs with 1000+ entities
Using core library directly
The issue is in graphiti_core/utils/maintenance/community_operations.py at line ~32-60

Root Cause

In get_community_clusters(), this loop executes one query per entity:

for node in nodes:  # If 1,210 entities, this runs 1,210 times
    records, _, _ = await driver.execute_query(
        match_query + """
        WITH count(e) AS count, m.uuid AS uuid
        RETURN uuid, count
        """,
        uuid=node.uuid,  # Separate query for each node
        group_id=group_id,
    )

This creates a "death by 1000 queries" scenario where:

1,210 async queries are launched simultaneously
Each query allocates memory for results
OS runs out of memory and kills the process
No Python exception occurs - immediate process termination

Possible Solution

Replace the N-query loop with a single batched query that fetches all neighbor relationships at once:

# Instead of querying each node individually, get all relationships in one query
match_query = """
    MATCH (n:Entity {group_id: $group_id})-[e:RELATES_TO]-(m:Entity {group_id: $group_id})
    WITH n.uuid AS source_uuid, m.uuid AS target_uuid, count(e) AS edge_count
    RETURN source_uuid, target_uuid, edge_count
"""

records, _, _ = await driver.execute_query(match_query, group_id=group_id)

# Build projection dictionary from batched results
projection: dict[str, list[Neighbor]] = {node.uuid: [] for node in nodes}

for record in records:
    source_uuid = record['source_uuid']
    target_uuid = record['target_uuid']
    edge_count = record['edge_count']
    
    if source_uuid in projection:
        projection[source_uuid].append(
            Neighbor(node_uuid=target_uuid, edge_count=edge_count)
        )

Benefits:

Single query to retrieve all relationships in scope.
Uses index on group_id for both sides of the relationship.
Performs only one roundtrip and one execution plan.
Linear data processing on the client side.

The fix is straightforward and maintains full backward compatibility of results.

Oct 09 '25 02:10 bek91

@bek91 Is this still an issue? Please confirm within 14 days or this issue will be closed.

Nov 17 '25 00:11 claude[bot]

@bek91 Is this still an issue? Please confirm within 14 days or this issue will be closed.

yes

Dec 01 '25 06:12 bek91

I also encountered the same issue with large graphs. Currently I have to disable building the community node but apparently it misses the benefit of gathering high level information.

Dec 15 '25 21:12 empyriumz