RFC: Comprehensive Improvements for Production-Ready Vectra

Open anvanster opened this issue 5 months ago • 1 comments

Summary

I'd like to propose and contribute a series of improvements to make Vectra suitable for production use cases. These changes would address current limitations while maintaining backward compatibility and the library's ease of use.

Motivation

Vectra has an excellent foundation as a local vector database, but several limitations prevent it from being used in production applications:

No update operations - Must delete and recreate items to modify them
Only soft delete - No way to permanently remove items or reclaim storage
No upsert functionality - Common pattern requires manual existence checking
Limited TypeScript support - Often requires @ts-ignore in consumer code
No batch operations - Performance suffers with multiple operations
Missing enterprise features - No transactions, versioning, or incremental indexing

Proposed Solutions

1. Enhanced Type System

Add dedicated id field to core data structure (maintaining backward compatibility)
Improve TypeScript generics for better type inference
Add proper types for all operations and results

2. Full CRUD Operations

Update Operations

// Update metadata without recomputing embeddings
await index.updateItem('item-1', {
  metadata: { status: 'updated' }
});

// Update text with automatic embedding recomputation
await index.updateItem('item-1', {
  text: 'New content'
});

Proper Delete

// Soft delete (default)
await index.deleteItem('item-1');

// Permanent delete
await index.deleteItem('item-1', true);

// Garbage collection
const purgedCount = await index.purgeDeleted();

Upsert

// Insert if new, update if exists
await index.upsertItem('item-1', {
  metadata: { category: 'example' },
  text: 'Content for embeddings'
});

3. Batch Operations

const results = await index.executeBatch([
  { operation: 'upsert', id: 'item-1', data: { metadata: { tag: 'batch' } } },
  { operation: 'update', id: 'item-2', data: { text: 'Updated text' } },
  { operation: 'delete', id: 'item-3' }
]);

4. Performance Improvements

Incremental index updates (no full rebuilds)
Optimized batch processing
Optional transaction support for consistency

Implementation Approach

I propose implementing these changes in phases:

Phase 1: Type system enhancements (non-breaking)
Phase 2: Update operations
Phase 3: Delete operations with garbage collection
Phase 4: Upsert and batch operations
Phase 5: Performance optimizations

Each phase would include:

Full test coverage
Documentation updates
Migration examples
Performance benchmarks

Backward Compatibility

All changes would maintain backward compatibility:

Existing code continues to work without modifications
ID can still be stored in metadata (deprecated but supported)
New features are opt-in
Clear migration path provided

Questions for the Community

Priority: Which improvements would be most valuable for your use cases?
API Design: Any preferences on the API design for these operations?
Breaking Changes: Would you accept minor breaking changes for cleaner APIs?
Additional Features: What other limitations have you encountered?
Performance: What are your performance requirements for batch operations?

Contribution Plan

I'm prepared to implement these improvements and would like to contribute them back to the main project. I can:

Submit separate PRs for each phase for easier review
Maintain a fork with early access to features
Provide extensive documentation and examples
Help with ongoing maintenance

Alternative Approaches Considered

Plugin System: Add features as plugins, but this would complicate the API
Complete Rewrite: Start fresh, but this loses the simplicity of current design
Wrapper Library: Build on top, but this adds unnecessary abstraction

Next Steps

Based on community feedback, I'll:

Create detailed design docs for approved features
Set up a development branch for testing
Begin implementation of highest-priority features
Provide regular updates on progress

Example Use Case

Here's how these improvements would simplify a common pattern:

Current Approach

// Updating an item currently
const items = await index.listItems();
const existing = items.find(item => item.metadata.id === 'user-123');
if (existing) {
  await index.deleteItem(existing);
}
await index.insertItem({
  vector: await embeddings.generateEmbedding(newText),
  metadata: { id: 'user-123', ...newMetadata }
});

With Improvements

// Much simpler and atomic
await index.upsertItem('user-123', {
  text: newText,
  metadata: newMetadata
});

I'm excited about the possibility of making Vectra production-ready while maintaining its excellent developer experience. Looking forward to your thoughts and feedback!

References

[Current Vectra Limitations (Issue #X)](#)
[Similar features in ChromaDB](https://docs.trychroma.com/api)
[Vector DB Comparison Chart](#)

Would you be interested in these improvements? Please 👍 or 👎 this issue and share your thoughts below.

Jul 10 '25 07:07 anvanster

Sorry for the delay @anvanster, I’m totally open to adding additional contributors to this project. The irony is that I no longer use vector databases in my own projects because I’ve worked out how to create an infinite context window. So I just feed everything to the model.

With that said, I realize that a lot of people are deriving value from this project so I’m happy to help move that forward anyway I can.

If anyone would like to become a contributor, just reply to this thread and I will send invites out.

Aug 01 '25 01:08 Stevenic