haystack-core-integrations icon indicating copy to clipboard operation
haystack-core-integrations copied to clipboard

ChromaDocumentStore fails to search if no metadata is given

Open JohnnyRacer opened this issue 1 year ago • 5 comments

Hello, I am trying to use the ChromaDocumentStore in my pipeline. I've noticed that if I do not add any metadata and try to perform a search, it will fail with the following error:

File /usr/local/lib/python3.10/dist-packages/haystack_integrations/document_stores/chroma/document_store.py:193, in ChromaDocumentStore.search(self, queries, top_k)
    187 """
    188 Perform vector search on the stored documents
    189 """
    190 results = self._collection.query(
    191     query_texts=queries, n_results=top_k, include=["embeddings", "documents", "metadatas", "distances"]
    192 )
--> 193 return self._query_result_to_documents(results)

File /usr/local/lib/python3.10/dist-packages/haystack_integrations/document_stores/chroma/document_store.py:331, in ChromaDocumentStore._query_result_to_documents(self, result)
    329 # prepare metadata
    330 if metadatas := result.get("metadatas"):
--> 331     document_dict["meta"] = dict(metadatas[i][j])
    333 if embeddings := result.get("embeddings"):
    334     document_dict["embedding"] = np.array(embeddings[i][j])

TypeError: 'NoneType' object is not iterable

This is the snippet that reproduces this error:

from haystack import Document
from haystack_integrations.document_stores.chroma import ChromaDocumentStore

ds = ChromaDocumentStore()
ds.write_documents([Document(content=e) for e in ["Hello world", "Whats up", "How are you"]] )
ds.search(["Hello world"], top_k=1)

The Document object does not seem to require the metadata argument so I assume this is unexpected behavior.

JohnnyRacer avatar Feb 21 '24 22:02 JohnnyRacer

That definitely looks like a bug, I'll look into it. Moving this issue to the integration repo for my convenience, thanks for reporting!

masci avatar Feb 22 '24 07:02 masci

Hey @JohnnyRacer - thanks for reporting some bugs lately both here and on the haystack repo. We'd love to hear what you're working on with Haystack 2.0 🚀 Feel free to join us on Discord or connect with me on Linkedin

TuanaCelik avatar Feb 22 '24 18:02 TuanaCelik

@TuanaCelik Thanks, I will check out the links!

JohnnyRacer avatar Feb 23 '24 00:02 JohnnyRacer

See also #668

anakin87 avatar Apr 18 '24 14:04 anakin87

On line 404 of chroma/document_store.py, insert a line.

if metadatas := result.get("metadatas"):
        if metadatas[i][j] is not None:         # avoid issue #462
                document_dict["meta"] = dict(metadatas[i][j])

MarcSchluperAtIntel avatar Apr 18 '24 22:04 MarcSchluperAtIntel