chroma icon indicating copy to clipboard operation
chroma copied to clipboard

[Bug]:The order in which documents are imported affects search results

Open MissJingRongLi opened this issue 1 year ago • 1 comments

What happened?

When I import 2000 files and embedding through bge-m3, save the vector to chromadb and search through query statement, I find that the order of the imported documents will affect the search results of chromadb. For example, when I search for "What's the weather like today?" I should have returned document A because it was A better match, but it returned document B, so I re-imported document A, and then performed a search and returned document A.

Versions

chromadb 0.5.3 python 3.10 macos

Relevant log output

No response

MissJingRongLi avatar Aug 09 '24 06:08 MissJingRongLi

@MissJingRongLi, can you share sample documents A and B + query that can be used to reproduce this?

tazarov avatar Aug 09 '24 09:08 tazarov

This is also likely related to ef_search being too low by default, as per https://github.com/chroma-core/chroma/issues/1737

@MissJingRongLi could you try creating a new collection and setting the hnsw:search_ef collection metadata key to 50?

i.e.

collection = client.create_collection(
        name="collection_name",
        metadata={"hnsw:search_ef": 50} 
    )

We have an issue open for improving index parametrization in general, which should help here: https://github.com/chroma-core/chroma/issues/2285

atroyn avatar Aug 12 '24 21:08 atroyn