haystack-core-integrations
haystack-core-integrations copied to clipboard
ChromaDocumentStore fails to search if no metadata is given
Hello, I am trying to use the ChromaDocumentStore in my pipeline. I've noticed that if I do not add any metadata and try to perform a search, it will fail with the following error:
File /usr/local/lib/python3.10/dist-packages/haystack_integrations/document_stores/chroma/document_store.py:193, in ChromaDocumentStore.search(self, queries, top_k)
187 """
188 Perform vector search on the stored documents
189 """
190 results = self._collection.query(
191 query_texts=queries, n_results=top_k, include=["embeddings", "documents", "metadatas", "distances"]
192 )
--> 193 return self._query_result_to_documents(results)
File /usr/local/lib/python3.10/dist-packages/haystack_integrations/document_stores/chroma/document_store.py:331, in ChromaDocumentStore._query_result_to_documents(self, result)
329 # prepare metadata
330 if metadatas := result.get("metadatas"):
--> 331 document_dict["meta"] = dict(metadatas[i][j])
333 if embeddings := result.get("embeddings"):
334 document_dict["embedding"] = np.array(embeddings[i][j])
TypeError: 'NoneType' object is not iterable
This is the snippet that reproduces this error:
from haystack import Document
from haystack_integrations.document_stores.chroma import ChromaDocumentStore
ds = ChromaDocumentStore()
ds.write_documents([Document(content=e) for e in ["Hello world", "Whats up", "How are you"]] )
ds.search(["Hello world"], top_k=1)
The Document object does not seem to require the metadata argument so I assume this is unexpected behavior.
That definitely looks like a bug, I'll look into it. Moving this issue to the integration repo for my convenience, thanks for reporting!
Hey @JohnnyRacer - thanks for reporting some bugs lately both here and on the haystack repo. We'd love to hear what you're working on with Haystack 2.0 🚀 Feel free to join us on Discord or connect with me on Linkedin
@TuanaCelik Thanks, I will check out the links!
See also #668
On line 404 of chroma/document_store.py, insert a line.
if metadatas := result.get("metadatas"):
if metadatas[i][j] is not None: # avoid issue #462
document_dict["meta"] = dict(metadatas[i][j])