graphiti Restricted metadata capabilities beyond temporal data limits use cases

Overview

This is a proposal to extend Graphiti by adding flexible metadata. The extension would enable developers to define custom metadata schemas for nodes and edges, supporting diverse use cases while maintaining all of Graphiti's existing temporally-aware knowledge graph capabilities.

Rationale

Graphiti effectively handles temporally relevant data. However, many use cases would require additional metadata filtering beyond time. Implementing flexible metadata management capabilities would expand Graphiti's applicability to additional scenarios.

Note on Implementation Approach: This proposal recommends implementing metadata as direct properties rather than nested in dictionaries to enable database-level filtering. Using the current attributes dictionary would require filtering after retrieving data from the database, which would be less performant for large datasets.

Use Cases

Documentation Versioning: Ingest, maintain, and perform RAG on different versions of documentation, ensuring queries return information relevant to a specific version. This would allow for updates and retrieval of information based on the version of the documentation being referenced, without deleting or rewriting existing nodes or edges, or the additional overhead of ingesting the full documentation, including unchanged parts of the documentation at each subsequent version release. This would also maintain greater consistency with graph structure.
Geographically-Relevant Information: Associate nodes and edges with specific regions or locations, enabling region-specific knowledge retrieval.
Audience-Targeted Content: Tag content for different user segments (beginners, experts, etc.).
Hardware/Platform Compatibility: Track which information applies to specific hardware models or software platforms.
Regulatory Compliance: Associate information with specific regulatory frameworks or jurisdictions.

Implementation Details

Core Data Structure Changes

Files to Modify:

graphiti_core/graphiti_types.py
- Add new types for metadata handling including range types, geographical types, etc.

This should likely be handled at the user level and not specific to the Graphiti library itself.

graphiti_core/nodes.py
- Extend EntityNode class to include metadata fields as direct properties, not nested in the attributes dictionary
- Implement methods for metadata validation and querying
- Implement serialization/deserialization for Neo4j storage
- Update CommunityNode class for metadata inheritance/aggregation from member nodes
graphiti_core/edges.py
- Add metadata fields to EntityEdge as direct properties, similar to the temporal fields
- Implement serialization/deserialization for Neo4j storage
graphiti_core/models/nodes/node_db_queries.py and graphiti_core/models/edges/edge_db_queries.py
- Update query templates to include direct references to metadata fields
- Optimize queries to filter metadata at the database level

Comparison Operators and Filtering

The existing ComparisonOperator enum in search_filters.py may need to be expanded to handle additional metadata-specific operations. Proposed additions include:

String operations:
- contains - Check if a string contains a substring
- starts_with - Check if a string starts with a prefix
- ends_with - Check if a string ends with a suffix
- regex_match - Match a string against a regular expression
Collection operations:
- in - Check if a value is in a collection
- contains_any - Check if a collection contains any of the specified values
- contains_all - Check if a collection contains all of the specified values
Geographical operations:
- within_radius - Check if a point is within a radius of another point
- within_polygon - Check if a point is within a polygon

Question for Maintainers: What additional comparison operators would you like to support for metadata filtering? Should these be implemented as an extension of the current ComparisonOperator enum or as a separate system?

Metadata Schema Definition

Create new file: graphiti_core/utils/metadata_schema.py
- Implement MetadataSchema class based on Pydantic with functions including:
  - Model validator forcing simple types, no nesting (ensure database-level filtering instead of client-side filtering)
  - Schema registration for centralized management (unless you think this should be handled by the SearchFilters class)
  - Abstract methods for specialized Neo4j query construction to be used by the SearchFilters class (how do you think this should be handled?)
  - Methods for generating optimized indices for metadata fields (what do you see should be here?)
- User can define the following by inheriting from the MetadataSchema class:
  - Metadata field types (e.g., version numbers, geographic coordinates, audience segments)
  - Metadata field constraints (e.g., range, equality, containment)
  - Query expression translation to Neo4j Cypher
  - Range and boundary operations
- Example for version metadata implementation:
  - User would define a class, e.g. CustomMetadataSchema, inheriting from MetadataSchema
  - Implement methods for translating custom queries (like the version_min/version_max pattern) to Cypher
graphiti_core/graphiti.py
- Update __init__ method to accept optional metadata schema configuration
- Add methods for registering and managing metadata schemas
- Create integration with the existing entity types system
- Add methods for schema validation during episode processing (should be handled by the custom MetadataSchema class created by the user)

Search and Filtering

graphiti_core/search/search_filters.py
- Extend SearchFilters class to utilize query construction methods from MetadataSchema (unless we should have the user define a custom SearchFilters class, this likely won't be as backwards compatible though)
- Implement composition of custom metadata filters. For instance support for logical operations (AND, OR, NOT) between filters defined by the user in the MetadataSchema class (or in the SearchFilters class, see question 2 below in the Questions for Maintainers section)
- Add simple generic wrapper functions for filter construction following ORM design patterns:
  - range_filter(field, min_value, max_value, inclusive=True) - Filter values between min and max
  - equality_filter(field, value) - Filter for exact matches
  - inequality_filter(field, value) - Filter for non-matches
  - collection_filter(field, values, match_type='any') - Filter for values in a collection
  - string_filter(field, pattern, operation='contains') - String operations like contains, startswith
  - proximity_filter(field, point, distance) - Distance-based filtering
  - logical_and(*filters) - Combine filters with AND logic
  - logical_or(*filters) - Combine filters with OR logic
  - logical_not(filter) - Negate a filter
graphiti_core/search/search.py
- Update search functions to incorporate metadata filtering at the query level
- Implement query execution with Cypher
- Modify community_search function to utilize metadata filtering
graphiti_core/search/search_utils.py
- Add metadata-specific search utilities that leverage Neo4j's query capabilities
- Implement query building for complex metadata filters (see question 2 below)

Database Operations

For updating the database operations to support metadata, we have several implementation options:

Option 1: Use Existing Templates with Attributes Dictionary (Not Recommended)

Continue using the existing query templates with no changes
Store metadata within the existing attributes dictionary
This approach would:
- Require retrieving all data from the database first
- Perform filtering in application code after retrieval
- Create performance bottlenecks with large datasets
- Effectively nullify the core benefits of this proposal
- Force users to build their own custom solution for metadata filtering

Obviously, this is not my preferred approach, but I wanted to include it for completeness.

Option 2: Create Separate Metadata-Aware Templates

Keep existing templates unchanged

Add new templates for metadata-aware operations:

ENTITY_NODE_SAVE_WITH_METADATA = """
    MERGE (n:Entity {uuid: $entity_data.uuid})
    SET n:$($labels)
    SET n = $entity_data
    SET n.metadata = $metadata_data
    WITH n CALL db.create.setNodeVectorProperty(n, "name_embedding", $entity_data.name_embedding)
    RETURN n.uuid AS uuid"""

Option 3: Metadata as Neo4j Labels

Use metadata categories as Neo4j labels for faster filtering

Example:

ENTITY_NODE_SAVE_WITH_METADATA_LABELS = """
    MERGE (n:Entity {uuid: $entity_data.uuid})
    SET n:$($labels):$($metadata_labels)
    SET n = $entity_data
    SET n.metadata = $metadata_data
    WITH n CALL db.create.setNodeVectorProperty(n, "name_embedding", $entity_data.name_embedding)
    RETURN n.uuid AS uuid"""

Question for Maintainers: Which database operation approach would you prefer? Option 2 provides cleaner separation but requires maintaining parallel implementations. Option 3 may offer better indexing but could lead to label proliferation.

graphiti_core/utils/maintenance/graph_data_operations.py
- Update build_indices_and_constraints to create optimized indices for metadata fields
- Implement compound indices for frequently combined query patterns (use ORM design patterns)
graphiti_core/utils/maintenance/edge_operations.py and graphiti_core/utils/maintenance/node_operations.py
- Extend extraction and resolution logic to handle metadata fields as direct properties
- Optimize bulk operations for metadata updates (only after bulk is no longer a WIP)
graphiti_core/utils/maintenance/temporal_operations.py (should this be a separate module?)
- Implement interaction between temporal operations and metadata fields
- Add support for temporal-metadata compound queries

Community Operations

graphiti_core/utils/maintenance/community_operations.py
- Update community clustering algorithms to utilize metadata fields directly
- Implement metadata aggregation at community level
- Add specialized handling for version ranges and geographic boundaries
graphiti_core/search/search.py and graphiti_core/search/search_utils.py
- Update community-related search functions to utilize metadata fields in queries
- Implement community filtering based on metadata criteria
graphiti_core/graphiti.py
- Update build_communities method to incorporate metadata in clustering decisions
- Implement metadata inheritance rules for communities

Episode Processing

graphiti_core/graphiti.py
- Update add_episode method to process metadata fields directly
- Implement metadata extraction and assignment
- Add metadata-based contradiction detection
graphiti_core/utils/bulk_utils.py (only after bulk is no longer a WIP)
- Implement metadata support for bulk operations
- Optimize for minimal database operations when processing metadata

Design Considerations

Performance Optimization
- Force custom metadata to be stored as direct properties, not nested in dictionaries, to enable database-level queries
- Filter at query level, not after retrieval, to minimize data transfer
Backward Compatibility
- Ensure existing code functions without modification
- Create migration utilities for existing knowledge graphs (likely a later addition as most current use cases don't need the granular metadata functionality)
- Maintain compatibility with current entity types system
Contradiction Handling
- Implement metadata-based contradiction detection
- Support customizable contradiction resolution strategies
- Integrate with existing temporal contradiction logic
Schema Management
- Support different metadata schemas per node/edge type
- Implement schema versioning to handle evolving metadata requirements
- Provide schema migration utilities
- Add support for schema validation and error reporting
Documentation
- Update API documentation with metadata usage examples
- Document performance considerations and best practices
- Provide schema definition examples for common use cases
- Include examples of combined temporal and metadata queries

Implementation Phases

Phase 1: Core Implementation
- Implement metadata fields as direct properties on nodes and edges
- Develop basic schema validation (allow user to define models and their own custom validation)
- Create database indices for metadata fields
- Update database query templates for direct field access (should this be in the SearchFilters class or defined in the MetadataSchema class? Which would be most testable, extensible, maintainable, and in-line with your roadmap?)
Phase 2: Query System Integration
- Extend SearchFilters for metadata-based filtering
- Implement query transformation
- Develop optimized filter combinations (likely should be a simple wrapper for AND/OR/NOT that the user can quickly use to combine filters)
Phase 3: Community Integration
- Extend CommunityNode with metadata capabilities
- Implement metadata-aware community clustering
- Add metadata inheritance and aggregation logic
- Optimize community-level queries
Phase 4: Advanced Features
- Implement metadata-based contradiction handling, in conjunction with temporal contradiction handling
- Add schema versioning support (probably a later addition)
- Develop specialized metadata types for common use cases (should likely just be done in the examples directory)
- Create utilities for expected common complex metadata operations (any thoughts on what should be included here?)
Phase 5: Optimization
- Fine-tune Neo4j indices for optimal query performance
- Optimize combined temporal and metadata queries
- Implement bulk operation support for metadata (only if bulk is no longer a WIP)
- Add advanced caching strategies for common query patterns

Conclusion

The flexible metadata logic extension could enhance Graphiti's capabilities and expand the use cases it would fit. By implementing metadata as direct properties rather than nested in dictionaries and focusing on database-level filtering, this extension could enable flexible querying for diverse use cases.

Questions for Maintainers

Temporal System Integration:
- I personally feel the temporal system should not be modified when extending the library this way. However, it might limit usage with more custom cases. What are your thoughts on this?
Implementation Approach:
- I propose a composition-based filter system that allows combining simple filters into complex expressions. This would enable query construction for complex cases like version ranges that require compound conditions with NULL handling. Would this approach align with your architectural vision?
- Alternative approaches could include:
  - Query builder pattern with fluent interface
  - Expression system similar to ORM query builders
  - Abstract base classes with inheritance
- Are there specific performance or any other considerations I missed in this proposal?
Neo4j Performance:
- Do you have specific recommendations for indexing strategies?
- Are there Neo4j features we should leverage for query optimization?
- Are there any NEO4j version compatibilities we need to consider?
API Design:
- What level of granularity (user based flexibility) do you prefer for metadata configuration and management?
  - Global configuration
  - Node/edge type specific configuration
  I think most flexibility should be given to the user to expand possible use cases. Yet, simple options should be available for the most common use cases. However, I don't want to do anything that doesn't align with your vision.
- How should this integrate with existing node creation workflows?
- What validation requirements should we implement, aside from flat schema with simple types?
Project Direction:
- Does this extension align with your roadmap for Graphiti?
- Are there additional use cases you want this extension to support?
  - Is there a use case you can think of that this proposal won't work for, or is there something wrong or missing in my proposal?
- How should the implementation phases be prioritized?

May 04 '25 01:05 evanmschultz

Hey, thanks for the long thought-out proposal. I would read about and maybe look a bit more into our custom entity type implementation. IT already does most of this, and I have a PR in the works to add metadata filtering o the attributes.

Let me know if you have further questions or feel something major is missing from this implementation (edge types and attribute filtering coming in the future).

https://help.getzep.com/graphiti/graphiti/custom-entity-types

May 04 '25 04:05 prasmussen15

Thank you for your response. From what I could see, there was nothing that would allow me to use this for the specific use case I had in mind, without filtering after DB retrieval. Essentially, it seemed that I would need to store metadata in the attributes dict and then, on read, retrieve all the nodes or whatever, and then process and filter based on what was stored in the dict. Please correct me if I am wrong.

I will explain more about what I am looking to do. I wish to make a coding document retrieval system. The RAG for that would require tags based on version numbers. The best I can imagine it working would require a version number for when it became relevant and a version number for when it stopped being relevant (null if it is still useful for the latest version). The hope is that a graph could be created for docs, (going back a given number of versions), and then with each subsequent version, the graph db would be updated, without erasing the previous docs or marking them as irrelevant. I would like to have an LLM be able to search the docs for a given version number and be able to quickly retrieve only the data that it is looking for and only for the specified version number.

The system I am imagining would need to hold many versions in a single graph. This would eliminate the need to create graphs for each version (repeating LLM api calls for data that literally didn't change). It would also maintain consistency in graph structure so an LLM could always get consistent results for a given library.

Looking at the project and the documentation you attached, it would work for this purpose, but as mentioned above, it would be slow for large graphs due to the nested nature of the attributes dict.

Do you have any thoughts or recommendations for this project? Do you think that some of the things I mentioned could benefit Graphiti now that you have this context?

I would love to discuss!

May 04 '25 05:05 evanmschultz

If anything, I would love to hear more about the PR you have in the works. Not to say I have any right, just saying I am interested!

May 04 '25 05:05 evanmschultz

@evanmschultz Is this still relevant? Please confirm within 14 days or this issue will be closed.

Oct 05 '25 00:10 claude[bot]

@evanmschultz Is this still an issue? Please confirm within 14 days or this issue will be closed.

Oct 22 '25 00:10 claude[bot]

@evanmschultz Is this still an issue? Please confirm within 14 days or this issue will be closed.

Oct 29 '25 00:10 claude[bot]

@evanmschultz Is this still relevant? Please confirm within 14 days or this issue will be closed.

Nov 17 '25 00:11 claude[bot]