TranslationWrapperPipeline returns wrong query format in debug mode
Describe the bug
When executing a TranslationWrapperPipeline (for example as in the test case test_extractive_qa_eval_translation) with run_batch with the debug=True setting, the format of the queries passed to the retriever is wrong and causes an AttributeError.
Each query is a tuple ({'query': 'Who Lives in Berlin?'}, 'output_1') but it should be simply a string 'Who Lives in Berlin?' instead.
The error occurs in self.vectorizer.transform(queries) here:
def _calc_scores(self, queries: Union[str, List[str]]) -> List[Dict[int, float]]:
if isinstance(queries, str):
queries = [queries]
question_vector = self.vectorizer.transform(queries)
Error message
" AttributeError: 'tuple' object has no attribute 'lower'"
As lower is called on doc which is expected to be a string 'Who Lives in Berlin?' but is a tuple ({'query': 'Who Lives in Berlin?'}, 'output_1') instead.
Expected behavior
The queries parameter should be ['Who Lives in Berlin?', 'Who Lives in Munich?']
Additional context
This issue is blocking proper use of pipeline.eval_batch as implemented in #2942
To Reproduce
input_translator = TransformersTranslator(model_name_or_path="Helsinki-NLP/opus-mt-de-en")
output_translator = TransformersTranslator(model_name_or_path="Helsinki-NLP/opus-mt-de-en")
pipeline = ExtractiveQAPipeline(reader=reader, retriever=retriever_with_docs)
pipeline = TranslationWrapperPipeline(
input_translator=input_translator, output_translator=output_translator, pipeline=pipeline
)
and instead of:
eval_result: EvaluationResult = pipeline.eval(labels=EVAL_LABELS, params={"Retriever": {"top_k": 5}})
try to run:
pipeline_output = pipeline.run_batch(queries=[label.query for label in EVAL_LABELS], params={"Retriever": {"top_k": 5}}, debug=True)
or if you have the eval_batch implementation from #2942 then try to run:
eval_result: EvaluationResult = pipeline.eval_batch(labels=EVAL_LABELS, params={"Retriever": {"top_k": 5}})