haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Unable to load pipeline using load_from_yml containing FAISSDocumentStore node

Open preethampaulose opened this issue 1 year ago • 3 comments

Describe the bug 'm trying to load the yml that was created using the save_to_yaml method. The pipeline was first tested to be working and saved. When I try to load the saved yaml file I'm getting error. The Pipeline consists of FaissDocumentStore, Retriever and Generator.

Error message could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:faiss.swigfaiss.IndexFlat' in "myPipeline.haystack-pipeline.yml", line 5, column 18

Expected behavior Load the pipeline containing all the nodes

Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce

  1. Unzip the attached file
  2. Run the "load saved faiss document.ipynb"

FAQ Check

System:

  • OS: Windows 11
  • GPU/CPU: CPU
  • Haystack version (commit or version number): 1.12.2
  • DocumentStore: FAISS
  • Reader: none
  • Retriever: DensePassageRetriever
  • Generator: Seq2SeqGenerator

Attachment load_issue.zip

preethampaulose avatar May 05 '23 10:05 preethampaulose

Hey @preethampaulose, you're using a quite outdated Haystack version. Would you mind trying to upgrade it and let us know if the issue is still present in the latest version?

ZanSara avatar Jun 05 '23 15:06 ZanSara

While investigating on https://github.com/deepset-ai/haystack/issues/4351, I discovered that this issue is still present.

To reproduce

from haystack.pipelines import Pipeline
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import EmbeddingRetriever

ds = FAISSDocumentStore(faiss_index_path="/home/anakin87/apps/experiments/doc-search/index/my_faiss_index.faiss",
                        faiss_config_path="/home/anakin87/apps/experiments/doc-search/index/my_faiss_index.json")

retriever = EmbeddingRetriever(
    document_store=ds,
    embedding_model="sentence-transformers/msmarco-distilbert-base-tas-b",
    model_format="sentence_transformers",
)

pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])

pipe.save_to_yaml("mypipe.yml")

Generated YAML

components:
- name: FAISSDocumentStore
  params:
    faiss_config_path: /home/anakin87/apps/experiments/doc-search/index/my_faiss_index.json
    faiss_index: !!python/object:faiss.swigfaiss.IndexFlat
      this: !!binary |
      !!! very long binary string !!!
    faiss_index_path: /home/anakin87/apps/experiments/doc-search/index/my_faiss_index.faiss
  type: FAISSDocumentStore
- name: Retriever
  params:
    document_store: FAISSDocumentStore
    embedding_model: sentence-transformers/msmarco-distilbert-base-tas-b
    model_format: sentence_transformers
  type: EmbeddingRetriever
pipelines:
- name: query
  nodes:
  - inputs:
    - Query
    name: Retriever
version: 1.19.0rc0

The generated YAML contains a very long binary string (the faiss index). My very first idea to solve this issue is about skipping this field in the save_to_yaml method.

anakin87 avatar Jul 07 '23 15:07 anakin87

Hey, I experienced the same problem. I was able to workaround this problem by removing all params except faiss_config_path and faiss_index_path from the FAISSDocumentStore. This makes sense because the FAISSDocumentStore constructor does not allow other params except the two mentioned if faiss_config_path is set (see reference). It seems like the FAISSDocumentStore.save_to_yaml() function does not take care of this rule.

Here is a working yaml (it assumes that a FAISSDocumentStore was indexed and saved before to the ./faiss directory):

components:
- name: FAISSDocumentStore
  params:
    faiss_config_path: ./faiss/config.json
    faiss_index_path: ./faiss/index.faiss
  type: FAISSDocumentStore
- name: Retriever
  params:
    document_store: FAISSDocumentStore
    embedding_model: sentence-transformers/all-MiniLM-L6-v2
  type: EmbeddingRetriever
- name: Reader
  params:
    model_name_or_path: sentence-transformers/all-MiniLM-L6-v2
  type: FARMReader
pipelines:
- name: query
  nodes:
  - inputs:
    - Query
    name: Retriever
  - inputs:
    - Retriever
    name: Reader
version: 1.22.1

I am using version 1.22.1 and not the latest version because of bug #5749

TomAtGithub avatar Jan 03 '24 09:01 TomAtGithub