Anton Troynikov
Anton Troynikov
### Describe the problem It's up to the user to select the distance metric they use with their collection. However, embedding models are trained with a pre-determined similarity metric. By...
# Auto-Ingestion Batching Chroma users tend to throw a bunch of data at Chroma at once to get their collections set up. On many developer-class machines, and remote notebook environments...
[`test_api.py`](https://github.com/chroma-core/chroma/blob/main/chromadb/test/test_api.py) is a legacy of Chroma's launch, and provided some basic initial test coverage. We've continued to add to it as the project has evolved, and it's become a catch-all...
## Description of changes This PR consumes the Configuration we pass through and store. Only the HNSW segment is actually currently parametrized, so that's where we consume the configuration. ##...
## Description of changes *Summarize the changes made by this PR.* - Improvements & Bug fixes - ... - New functionality - ... ## Test plan *How are these changes...
# Default Embedding Function Embedding functions have significant influence on the accuracy of retrieval, especially recall. Currently we use a fairly basic sentence transformer model, but lately there have been...
Currently we have a single proto representation for segments: https://github.com/chroma-core/chroma/blob/anton/consume-config/idl/chromadb/proto/chroma.proto#L49 We would prefer to have these strongly typed, per segment type, for example because we want a strongly typed configuration...
# Generate IDs Asking users to pass in IDs with their documents is another place where they can trip up. While this is a minor friction, eliminating this step will...
# Persistent Embedding Functions Chroma's embedding functions (EF) don't persist with collections. This creates awkward API surface area and gotchas like needing to pass an EF if you call `get`....
It's possible for users to not keep upgrades of clients and servers for Chroma in sync (e.g. https://github.com/chroma-core/chroma/issues/2510) This can lead to confusing errors. In general, we don't want to...