haystack icon indicating copy to clipboard operation
haystack copied to clipboard

TypeError when using Advanced RAG

Open liviaj29 opened this issue 1 year ago • 1 comments

Describe the bug When running the code provided in the documentation and blog for HyDE (Advanced RAG), there is a type error that originates from running the Hypothetical Document Embedder pipeline connecting the adapter to the embedder. The OutputAdapter returns the list of documents as a string, but SentenceTransformersDocumentEmbedder() expects them as a list of documents.

The output type defined in OutputAdapter is incorrect as it specifies List[Document] but the type is actually a string.

Error message TypeError: SentenceTransformersDocumentEmbedder expects a list of Documents as input.In case you want to embed a list of strings, please use the SentenceTransformersTextEmbedder.

Expected behavior The pipeline to embed the documents with the Hypothetical Document Embedder should run without error, and generate the hypothetical embeddings.

Additional context The error occurs when the code is copied from the tutorials directly. Also when the code has been swapped out to use an Ollama Generator and local PDFs as the data.

The error can be fixed by using the .to_dict() method in the custom_filters on each Document, then in the SentenceTranformersDocumentEmbedder() using the .from_dict() method. I would be happy to create a pull request with this change.

To Reproduce Copy and run the code from either of these tutorials: https://docs.haystack.deepset.ai/docs/hypothetical-document-embeddings-hyde and https://haystack.deepset.ai/blog/optimizing-retrieval-with-hyde

FAQ Check

System:

  • OS: ubuntu
  • GPU/CPU: nvidia GeForce RTX 3070
  • Haystack version (commit or version number): 2.3.1
  • DocumentStore: ChromaDocumentStore/InMemory
  • Reader: N/A
  • Retriever: ChromaEmbeddingRetriever/InMemory

liviaj29 avatar Aug 19 '24 13:08 liviaj29

Related to #8176 and #8161. Should be fixed in the upcoming 2.5.0 release.

anakin87 avatar Aug 19 '24 14:08 anakin87

Haystack 2.5.0 release is out: https://github.com/deepset-ai/haystack/releases/tag/v2.5.0 so we can follow up here

julian-risch avatar Sep 07 '24 14:09 julian-risch

@liviaj29 Could you provide more clarity as to where to use the from_dict() function. Encountering this error and not sure how to fix it :/

thatboytemi avatar Feb 05 '25 04:02 thatboytemi