haystack
haystack copied to clipboard
RayPipeline fails with two Retrievers + Reader pipeline
Describe the bug
The RayPipeline
fails with two Retrievers (like BM25&Embedding Retriever or two BM25 Retrievers, etc) + Reader pipeline. It seems the RayPipeline was never tested for parallel nodes (eg BM25Retriever and EmbeddingRetriever nodes needs to join their results to hand those over to the Reader node)
Error message
File ~/test-ray-haystack/haystack/pipelines/ray.py:271, in RayPipeline.run(self, query, file_paths, labels, documents, meta, params)
--> 266 output = self.graph.nodes[n_id]["component"].run(**input_dict)
267 inputs_for_join_node["inputs"].append(output)
268 input_dict = inputs_for_join_node
TypeError: 'RayServeSyncHandle' object is not callable
Expected behavior
The RayPipeline
should work with any and all Haystack Pipelines, including parallel nodes.
Additional context
We have discussed this with @ZanSara on Slack. We agreed that as part of a refactoring I will remove RayPipeline.run
and make Pipeline.run
to be usable from RayPipeline
- which should fix this issue and any other potential issues hidden by using the customer RayPipeline.run
instead of Pipeline.run
.
To Reproduce
Run RayPipeline
with a standard BM25 & Embedding Retriever + Reader pipeline.
FAQ Check
- [ x ] Have you had a look at our new FAQ page?
System:
- OS: Linux Mint 20.2
- GPU/CPU:
- Haystack version (commit or version number): 1.6.0
- DocumentStore: Weaviate
- Reader: deepset/roberta-base-squad2
- Retriever: BM25 and EmbeddingRetriever (sentence-transformers/multi-qa-mpnet-base-dot-v1)
As agreed with @ZanSara, I will be providing a PR for this
Thanks for flagging this and volunteering a PR @zoltan-fedor 💪🏽