[Bug]:The order in which documents are imported affects search results
What happened?
When I import 2000 files and embedding through bge-m3, save the vector to chromadb and search through query statement, I find that the order of the imported documents will affect the search results of chromadb. For example, when I search for "What's the weather like today?" I should have returned document A because it was A better match, but it returned document B, so I re-imported document A, and then performed a search and returned document A.
Versions
chromadb 0.5.3 python 3.10 macos
Relevant log output
No response
@MissJingRongLi, can you share sample documents A and B + query that can be used to reproduce this?
This is also likely related to ef_search being too low by default, as per https://github.com/chroma-core/chroma/issues/1737
@MissJingRongLi could you try creating a new collection and setting the hnsw:search_ef collection metadata key to 50?
i.e.
collection = client.create_collection(
name="collection_name",
metadata={"hnsw:search_ef": 50}
)
We have an issue open for improving index parametrization in general, which should help here: https://github.com/chroma-core/chroma/issues/2285