haystack
haystack copied to clipboard
Unable to load pipeline using load_from_yml containing FAISSDocumentStore node
Describe the bug 'm trying to load the yml that was created using the save_to_yaml method. The pipeline was first tested to be working and saved. When I try to load the saved yaml file I'm getting error. The Pipeline consists of FaissDocumentStore, Retriever and Generator.
Error message could not determine a constructor for the tag 'tag:yaml.org,2002:python/object:faiss.swigfaiss.IndexFlat' in "myPipeline.haystack-pipeline.yml", line 5, column 18
Expected behavior Load the pipeline containing all the nodes
Additional context Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.
To Reproduce
- Unzip the attached file
- Run the "load saved faiss document.ipynb"
FAQ Check
- [x] Have you had a look at our new FAQ page?
System:
- OS: Windows 11
- GPU/CPU: CPU
- Haystack version (commit or version number): 1.12.2
- DocumentStore: FAISS
- Reader: none
- Retriever: DensePassageRetriever
- Generator: Seq2SeqGenerator
Attachment load_issue.zip
Hey @preethampaulose, you're using a quite outdated Haystack version. Would you mind trying to upgrade it and let us know if the issue is still present in the latest version?
While investigating on https://github.com/deepset-ai/haystack/issues/4351, I discovered that this issue is still present.
To reproduce
from haystack.pipelines import Pipeline
from haystack.document_stores import FAISSDocumentStore
from haystack.nodes import EmbeddingRetriever
ds = FAISSDocumentStore(faiss_index_path="/home/anakin87/apps/experiments/doc-search/index/my_faiss_index.faiss",
faiss_config_path="/home/anakin87/apps/experiments/doc-search/index/my_faiss_index.json")
retriever = EmbeddingRetriever(
document_store=ds,
embedding_model="sentence-transformers/msmarco-distilbert-base-tas-b",
model_format="sentence_transformers",
)
pipe = Pipeline()
pipe.add_node(component=retriever, name="Retriever", inputs=["Query"])
pipe.save_to_yaml("mypipe.yml")
Generated YAML
components:
- name: FAISSDocumentStore
params:
faiss_config_path: /home/anakin87/apps/experiments/doc-search/index/my_faiss_index.json
faiss_index: !!python/object:faiss.swigfaiss.IndexFlat
this: !!binary |
!!! very long binary string !!!
faiss_index_path: /home/anakin87/apps/experiments/doc-search/index/my_faiss_index.faiss
type: FAISSDocumentStore
- name: Retriever
params:
document_store: FAISSDocumentStore
embedding_model: sentence-transformers/msmarco-distilbert-base-tas-b
model_format: sentence_transformers
type: EmbeddingRetriever
pipelines:
- name: query
nodes:
- inputs:
- Query
name: Retriever
version: 1.19.0rc0
The generated YAML contains a very long binary string (the faiss index).
My very first idea to solve this issue is about skipping this field in the save_to_yaml
method.
Hey, I experienced the same problem. I was able to workaround this problem by removing all params except faiss_config_path
and faiss_index_path
from the FAISSDocumentStore. This makes sense because the FAISSDocumentStore constructor does not allow other params except the two mentioned if faiss_config_path
is set (see reference). It seems like the FAISSDocumentStore.save_to_yaml()
function does not take care of this rule.
Here is a working yaml (it assumes that a FAISSDocumentStore was indexed and saved before to the ./faiss
directory):
components:
- name: FAISSDocumentStore
params:
faiss_config_path: ./faiss/config.json
faiss_index_path: ./faiss/index.faiss
type: FAISSDocumentStore
- name: Retriever
params:
document_store: FAISSDocumentStore
embedding_model: sentence-transformers/all-MiniLM-L6-v2
type: EmbeddingRetriever
- name: Reader
params:
model_name_or_path: sentence-transformers/all-MiniLM-L6-v2
type: FARMReader
pipelines:
- name: query
nodes:
- inputs:
- Query
name: Retriever
- inputs:
- Retriever
name: Reader
version: 1.22.1
I am using version 1.22.1 and not the latest version because of bug #5749