[BUG] RuntimeWarning errors in MMR reranker, search does not return results
Bug Description
The MMR (Maximal Marginal Relevance) reranker in graphiti_core/search/search_utils.py produces numpy RuntimeWarning errors during dot product calculations when processing embedding vectors, causing search operations to return empty results. The warnings indicate mathematical issues including divide by zero, overflow, and invalid value encounters during similarity calculations and MMR score computation, which break the MMR algorithm's functionality.
Steps to Reproduce
Provide a minimal code example that reproduces the issue:
# Example search operation that triggers the MMR reranker warnings
from graphiti_core.search.search_config import SearchConfig, NodeSearchConfig, NodeSearchMethod, NodeReranker
from graphiti_core.search.search_filters import SearchFilters
# Configure search with MMR reranker
search_config = SearchConfig(
node_config=NodeSearchConfig(
search_methods=[NodeSearchMethod.cosine_similarity, NodeSearchMethod.bm25],
reranker=NodeReranker.mmr,
sim_min_score=0.6,
mmr_lambda=0.7,
bfs_max_depth=3
),
limit=20
)
# Search filters targeting specific node types
search_filters = SearchFilters(
node_labels=['Product', 'SparePart']
)
# Perform search that triggers MMR reranker
await graphiti.search_with_config(
query="product features and available options",
config=search_config,
search_filters=search_filters
)
Expected Behavior
The search operation should complete successfully without any runtime warnings and return relevant search results. The MMR reranker should properly calculate similarity scores and MMR values for all candidate embeddings, returning ranked results based on both relevance to the query and diversity.
Actual Behavior
The search operation produces multiple numpy RuntimeWarning messages and fails to return any results:
RuntimeWarning: divide by zero encountered in dotRuntimeWarning: overflow encountered in dotRuntimeWarning: invalid value encountered in dot
These warnings occur during the MMR calculation process, specifically in the similarity matrix computation and MMR score calculation steps. When these warnings are triggered, the search operation completes but returns an empty result set instead of the expected relevant nodes.
Environment
- Graphiti Version: 0.18.0
- Python Version: 3.13
- Operating System: macOS (darwin 24.5.0)
- Database Backend: neo4j
- LLM Provider & Model: OpenAI GPT 4.1-mini
Installation Method
- [ ] pip install
- [ ] uv add
- [x] poetry
- [ ] Development installation (git clone)
Error Messages/Traceback
/Users/username/Library/Caches/pypoetry/virtualenvs/project-env-py3.13/lib/python3.13/site-packages/graphiti_core/search/search_utils.py:1032: RuntimeWarning: divide by zero encountered in dot
similarity = np.dot(u, v)
/Users/username/Library/Caches/pypoetry/virtualenvs/project-env-py3.13/lib/python3.13/site-packages/graphiti_core/search/search_utils.py:1032: RuntimeWarning: overflow encountered in dot
similarity = np.dot(u, v)
/Users/username/Library/Caches/pypoetry/virtualenvs/project-env-py3.13/lib/python3.13/site-packages/graphiti_core/search/search_utils.py:1032: RuntimeWarning: invalid value encountered in dot
similarity = np.dot(u, v)
/Users/username/Library/Caches/pypoetry/virtualenvs/project-env-py3.13/lib/python3.13/site-packages/graphiti_core/search/search_utils.py:1040: RuntimeWarning: divide by zero encountered in dot
mmr = mmr_lambda * np.dot(query_array, candidate_arrays[uuid]) + (mmr_lambda - 1) * max_sim
/Users/username/Library/Caches/pypoetry/virtualenvs/project-env-py3.13/lib/python3.13/site-packages/graphiti_core/search/search_utils.py:1040: RuntimeWarning: overflow encountered in dot
mmr = mmr_lambda * np.dot(query_array, candidate_arrays[uuid]) + (mmr_lambda - 1) * max_sim
/Users/username/Library/Caches/pypoetry/virtualenvs/project-env-py3.13/lib/python3.13/site-packages/graphiti_core/search/search_utils.py:1040: RuntimeWarning: invalid value encountered in dot
mmr = mmr_lambda * np.dot(query_array, candidate_arrays[uuid]) + (mmr_lambda - 1) * max_sim
Configuration
# Search configuration triggering the issue
SearchConfig(
edge_config=None,
node_config=NodeSearchConfig(
search_methods=[NodeSearchMethod.cosine_similarity, NodeSearchMethod.bm25],
reranker=NodeReranker.mmr,
sim_min_score=0.6,
mmr_lambda=0.7,
bfs_max_depth=3
),
episode_config=None,
community_config=None,
limit=20
)
# Search filters applied
SearchFilters(
node_labels=['Product', 'SparePart'],
edge_types=None,
valid_at=None,
invalid_at=None,
created_at=None,
expired_at=None
)
Additional Context
- This happens consistently when using the MMR reranker with the specific search configuration
- Using the core library component
- All data was ingested into the graph using the
add_episodefeature through normal workflow - No manual edits or modifications were made to the graph or embeddings
- The embeddings were generated automatically during the ingestion process using standard embedding providers
- The issue suggests that some embedding vectors in the database may contain zero vectors, extremely large values, or NaN/infinity values
- The search operation fails to return any results when these warnings occur, making the MMR reranker completely non-functional in affected cases
Possible Solution
The issue appears to stem from invalid or malformed embedding vectors in the database. Potential solutions include:
-
Input validation: Add validation in the
maximal_marginal_relevancefunction to check for and handle invalid embedding values (NaN, infinity, zero vectors) before performing dot product calculations -
Defensive programming: Use numpy functions like
np.nan_to_num()ornp.clip()to sanitize embedding vectors before mathematical operations -
Embedding quality checks: Add validation during the embedding ingestion process to ensure all generated embeddings are valid numerical vectors
-
Error handling: Implement graceful fallback behavior when invalid embeddings are encountered, such as excluding problematic vectors from MMR calculations
The fix should likely be implemented in the maximal_marginal_relevance function around lines 1032 and 1040 where the dot product operations occur.
@paveljakov Is this still an issue? Please confirm within 14 days or this issue will be closed.
@paveljakov Is this still an issue? Please confirm within 14 days or this issue will be closed.