[FEATURE] support ChromaDB as a builtin target
Similar to LanceDB support. Please make ChromaDB an optional dependency.
❤️ Contributors, please refer to 📙Contributing Guide.
Unless the PR can be sent immediately (e.g. just a few lines of code), we recommend you to leave a comment on the issue like I'm working on it or Can I work on this issue? to avoid duplicating work. Our Discord server is always open and friendly.
Hi @georgeh0! I'd like to work on this ChromaDB integration.
I've reviewed the LanceDB implementation (~450 lines) and understand the pattern. However, before I start, I wanted to check if you'd be open to a phased approach:
Phase 1 (MVP - this PR):
- Basic ChromaDB connector with core CRUD operations
- Schema mapping (PyArrow types → ChromaDB metadata)
- Optional dependency setup
- Basic tests
- Estimated scope: ~150-200 lines
Phase 2 (follow-up PR):
- Vector index support with configurable metrics
- Advanced indexing options
- Performance optimizations
This would allow for faster initial integration and incremental improvements. The ChromaDB API is simpler than LanceDB in some ways (no PyArrow conversion needed), so the basic implementation should be more compact.
Would this approach work for you, or would you prefer a complete implementation matching LanceDB's feature set in a single PR?
Looking forward to your guidance!
@ambicuity thanks for taking this! assigned.
open for take!
Would like to take this up 😄
Dropping a ping again to be assigned (just for the record); will be working on it anyway!
@Haleshot assigned. Thanks for taking this!