haystack icon indicating copy to clipboard operation
haystack copied to clipboard

TranslationWrapperPipeline returns wrong query format in debug mode

Open julian-risch opened this issue 3 years ago • 0 comments

Describe the bug When executing a TranslationWrapperPipeline (for example as in the test case test_extractive_qa_eval_translation) with run_batch with the debug=True setting, the format of the queries passed to the retriever is wrong and causes an AttributeError. Each query is a tuple ({'query': 'Who Lives in Berlin?'}, 'output_1') but it should be simply a string 'Who Lives in Berlin?' instead.

The error occurs in self.vectorizer.transform(queries) here:

def _calc_scores(self, queries: Union[str, List[str]]) -> List[Dict[int, float]]:
        if isinstance(queries, str):
            queries = [queries]
        question_vector = self.vectorizer.transform(queries)

Error message " AttributeError: 'tuple' object has no attribute 'lower'" As lower is called on doc which is expected to be a string 'Who Lives in Berlin?' but is a tuple ({'query': 'Who Lives in Berlin?'}, 'output_1') instead.

Expected behavior The queries parameter should be ['Who Lives in Berlin?', 'Who Lives in Munich?']

Additional context This issue is blocking proper use of pipeline.eval_batch as implemented in #2942

To Reproduce

input_translator = TransformersTranslator(model_name_or_path="Helsinki-NLP/opus-mt-de-en")
output_translator = TransformersTranslator(model_name_or_path="Helsinki-NLP/opus-mt-de-en")

pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever_with_docs)
pipeline = TranslationWrapperPipeline(
    input_translator=input_translator, output_translator=output_translator, pipeline=pipeline
)

and instead of:

eval_result: EvaluationResult = pipeline.eval(labels=EVAL_LABELS, params={"Retriever": {"top_k": 5}})

try to run:

pipeline_output = pipeline.run_batch(queries=[label.query for label in EVAL_LABELS], params={"Retriever": {"top_k": 5}}, debug=True)

or if you have the eval_batch implementation from #2942 then try to run:

eval_result: EvaluationResult = pipeline.eval_batch(labels=EVAL_LABELS, params={"Retriever": {"top_k": 5}})

julian-risch avatar Aug 03 '22 16:08 julian-risch