vectra icon indicating copy to clipboard operation
vectra copied to clipboard

RFC: Comprehensive Improvements for Production-Ready Vectra

Open anvanster opened this issue 5 months ago • 1 comments

Summary

I'd like to propose and contribute a series of improvements to make Vectra suitable for production use cases. These changes would address current limitations while maintaining backward compatibility and the library's ease of use.

Motivation

Vectra has an excellent foundation as a local vector database, but several limitations prevent it from being used in production applications:

  1. No update operations - Must delete and recreate items to modify them
  2. Only soft delete - No way to permanently remove items or reclaim storage
  3. No upsert functionality - Common pattern requires manual existence checking
  4. Limited TypeScript support - Often requires @ts-ignore in consumer code
  5. No batch operations - Performance suffers with multiple operations
  6. Missing enterprise features - No transactions, versioning, or incremental indexing

Proposed Solutions

1. Enhanced Type System

  • Add dedicated id field to core data structure (maintaining backward compatibility)
  • Improve TypeScript generics for better type inference
  • Add proper types for all operations and results

2. Full CRUD Operations

Update Operations

// Update metadata without recomputing embeddings
await index.updateItem('item-1', {
  metadata: { status: 'updated' }
});

// Update text with automatic embedding recomputation
await index.updateItem('item-1', {
  text: 'New content'
});

Proper Delete

// Soft delete (default)
await index.deleteItem('item-1');

// Permanent delete
await index.deleteItem('item-1', true);

// Garbage collection
const purgedCount = await index.purgeDeleted();

Upsert

// Insert if new, update if exists
await index.upsertItem('item-1', {
  metadata: { category: 'example' },
  text: 'Content for embeddings'
});

3. Batch Operations

const results = await index.executeBatch([
  { operation: 'upsert', id: 'item-1', data: { metadata: { tag: 'batch' } } },
  { operation: 'update', id: 'item-2', data: { text: 'Updated text' } },
  { operation: 'delete', id: 'item-3' }
]);

4. Performance Improvements

  • Incremental index updates (no full rebuilds)
  • Optimized batch processing
  • Optional transaction support for consistency

Implementation Approach

I propose implementing these changes in phases:

  1. Phase 1: Type system enhancements (non-breaking)
  2. Phase 2: Update operations
  3. Phase 3: Delete operations with garbage collection
  4. Phase 4: Upsert and batch operations
  5. Phase 5: Performance optimizations

Each phase would include:

  • Full test coverage
  • Documentation updates
  • Migration examples
  • Performance benchmarks

Backward Compatibility

All changes would maintain backward compatibility:

  • Existing code continues to work without modifications
  • ID can still be stored in metadata (deprecated but supported)
  • New features are opt-in
  • Clear migration path provided

Questions for the Community

  1. Priority: Which improvements would be most valuable for your use cases?
  2. API Design: Any preferences on the API design for these operations?
  3. Breaking Changes: Would you accept minor breaking changes for cleaner APIs?
  4. Additional Features: What other limitations have you encountered?
  5. Performance: What are your performance requirements for batch operations?

Contribution Plan

I'm prepared to implement these improvements and would like to contribute them back to the main project. I can:

  • Submit separate PRs for each phase for easier review
  • Maintain a fork with early access to features
  • Provide extensive documentation and examples
  • Help with ongoing maintenance

Alternative Approaches Considered

  1. Plugin System: Add features as plugins, but this would complicate the API
  2. Complete Rewrite: Start fresh, but this loses the simplicity of current design
  3. Wrapper Library: Build on top, but this adds unnecessary abstraction

Next Steps

Based on community feedback, I'll:

  1. Create detailed design docs for approved features
  2. Set up a development branch for testing
  3. Begin implementation of highest-priority features
  4. Provide regular updates on progress

Example Use Case

Here's how these improvements would simplify a common pattern:

Current Approach

// Updating an item currently
const items = await index.listItems();
const existing = items.find(item => item.metadata.id === 'user-123');
if (existing) {
  await index.deleteItem(existing);
}
await index.insertItem({
  vector: await embeddings.generateEmbedding(newText),
  metadata: { id: 'user-123', ...newMetadata }
});

With Improvements

// Much simpler and atomic
await index.upsertItem('user-123', {
  text: newText,
  metadata: newMetadata
});

I'm excited about the possibility of making Vectra production-ready while maintaining its excellent developer experience. Looking forward to your thoughts and feedback!

References


Would you be interested in these improvements? Please 👍 or 👎 this issue and share your thoughts below.

anvanster avatar Jul 10 '25 07:07 anvanster

Sorry for the delay @anvanster, I’m totally open to adding additional contributors to this project. The irony is that I no longer use vector databases in my own projects because I’ve worked out how to create an infinite context window. So I just feed everything to the model.

With that said, I realize that a lot of people are deriving value from this project so I’m happy to help move that forward anyway I can.

If anyone would like to become a contributor, just reply to this thread and I will send invites out.

Stevenic avatar Aug 01 '25 01:08 Stevenic