chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]: Batch Size Variation in Collection.add Leads to Inconsistent Query Results

Open AlejandroMonroyDocusign opened this issue 1 year ago • 5 comments

What happened?

When using collection.add with varying batch sizes for identical data, the outcomes post-querying exhibit notable discrepancies.

For example:

Using collection.add to add embeddings with a batch size of 5,000 compared to a batch size of 1 yields consistent results across initializations for the latter. Conversely, for a batch size of 5,000, executing the same query across initializations produces varied outcomes.

Versions

chromadb==0.4.12 chroma-hnswlib==0.7.3 Python 3.10.13

Relevant log output

Our current embeddings data set is private which prevents me from sharing, this issue should happen with any data set, I'll try to find one to reproduce the issue as soon as possible.

AlejandroMonroyDocusign avatar Feb 13 '24 19:02 AlejandroMonroyDocusign