chroma
chroma copied to clipboard
[Bug]: Batch Size Variation in Collection.add Leads to Inconsistent Query Results
What happened?
When using collection.add with varying batch sizes for identical data, the outcomes post-querying exhibit notable discrepancies.
For example:
Using collection.add to add embeddings with a batch size of 5,000 compared to a batch size of 1 yields consistent results across initializations for the latter. Conversely, for a batch size of 5,000, executing the same query across initializations produces varied outcomes.
Versions
chromadb==0.4.12 chroma-hnswlib==0.7.3 Python 3.10.13
Relevant log output
Our current embeddings data set is private which prevents me from sharing, this issue should happen with any data set, I'll try to find one to reproduce the issue as soon as possible.