Large memory.json : Memory Indexing

Open akvoloh opened this issue 5 months ago • 0 comments

Request for an automated memory indexing system that dynamically analyzes MCP memory content and generates optimized index entities for efficient token usage and faster information retrieval.

Problem Statement

The use of MCP Memory is the primary source of notes and list items for several projects and IDE's. Current MCP memory systems suffer from token inefficiency when retrieving information:

Large memory files (80k+ tokens) result in expensive queries
Manual index creation is time-consuming and error-prone
Generic search patterns don't leverage content structure
No automatic optimization for common access patterns

Proposed Solution

Implement an indexing system that:

1. Content Analysis Engine

Automatically scans all entities in memory.json
Identifies content patterns and categories (Directives, project status, file locations, etc.)
Extracts key information using pattern matching and keyword detection
Builds content category maps based on actual memory content

2. Dynamic Index Generation

Creates summary entities only for content categories that exist
Generates optimized observations limited to essential information
Produces consistent naming conventions for quick access commands
Maintains backward compatibility with existing memory structure

3. Token Optimization

Limits index entity size to <500 tokens each
Creates hierarchical access patterns (summary → detailed → comprehensive)
Implements content prioritization (most relevant information first)
Provides progressive detail strategies for complex queries

4. Command Integration

Generates standardized quick access commands (memrules, memstatus, etc.)
Creates mapping between commands and specific index entities
Implements fallback strategies for missing content
Maintains consistent search patterns across projects

Technical Requirements

Core Components Needed:

Memory Parser: Read and parse JSON Lines format memory files
Pattern Analyzer: Detect content categories using configurable patterns
Index Generator: Create optimized summary entities dynamically
Command Mapper: Generate quick access command definitions
Output Writer: Write optimized memory files with indexes

Configuration Support:

Configurable content patterns for different project types
Adjustable token limits per index category
Customizable command naming conventions
Optional index categories based on project needs

Performance Targets:

Generate indexes with <500 tokens each
Achieve 95%+ token reduction for common queries
Support memory files with 1000+ entities

Expected Benefits

For System Performance:

Lower API costs due to reduced token usage
Faster response times with targeted queries
Improved scalability as memory files grow
Better user experience with consistent performance

Compatibility Requirements

Maintain full backward compatibility with existing memory.json format
Support both JSON Lines and standard JSON input formats
Work with existing MCP server implementations
Preserve all existing entity relationships and metadata

Jul 24 '25 02:07 akvoloh