haystack icon indicating copy to clipboard operation
haystack copied to clipboard

Can't specify index during TF_IDF fit

Open rdubester opened this issue 1 year ago • 3 comments

Not sure if this is a missing feature, bug, or a conceptual error on my part. I am using a FAISS document store that needs to support multiple indexes. After writing all of my documents with the specified index, I call the update_embeddings method with a DPR retriever and the same index. This seems to work fine. Next, I try and call the fit method on the TF_IDF retriever to support sparse retrieval, but get an error showing 0 documents processed. This seems to be because the fit method does not take any parameters, but calls the get_all_documents() method which does take an index parameter. The simple fix seems to be adding an optional index parameter to the TF_IDFRetreiver fit method.

rdubester avatar Sep 08 '22 16:09 rdubester

👋 Please report the code, so we can have a reproducible example and help you.

anakin87 avatar Sep 08 '22 16:09 anakin87

Of course. Here is my document ingestion pipeline. Everything seems to work fine except the .fit() call

def add(files, index):
    make_temp_dir(files)
    docs = convert_files_to_docs(temp_dir, clean_func=clean_wiki_text)
    processed = preprocessors['default'].process(docs)
    document_store.write_documents(processed, index=index)
    document_store.update_embeddings(
        retriever = retrievers['dpr'],
        update_existing_embeddings = False,
        index=index
    )
    retrievers['tf_idf'].fit() # this is broken because the documents are stored under an index
    destoy_temp_dir()

The .fit() call in turn calls the document_store.get_all_documents() method here which if not given an index will default to the document store's self.index, which is not correct in this case.

rdubester avatar Sep 08 '22 16:09 rdubester

Hi @rdubester thanks for the code snippet! Could you also provide us with the code you use to create retrievers and preprocessors?

sjrl avatar Sep 09 '22 07:09 sjrl

Hey @rdubester let us know if you solved or worked around the issue somehow. If it's solved I would close the issue.

ZanSara avatar Oct 19 '22 10:10 ZanSara

Probably related to #1634.

anakin87 avatar Oct 28 '22 10:10 anakin87

I'm closing this, @rdubester feel free to reopen if this is still an issue with the latest Haystack release

masci avatar Dec 30 '22 11:12 masci