haystack icon indicating copy to clipboard operation
haystack copied to clipboard

RayPipeline fails with two Retrievers + Reader pipeline

Open zoltan-fedor opened this issue 2 years ago • 2 comments

Describe the bug The RayPipeline fails with two Retrievers (like BM25&Embedding Retriever or two BM25 Retrievers, etc) + Reader pipeline. It seems the RayPipeline was never tested for parallel nodes (eg BM25Retriever and EmbeddingRetriever nodes needs to join their results to hand those over to the Reader node)

Error message

File ~/test-ray-haystack/haystack/pipelines/ray.py:271, in RayPipeline.run(self, query, file_paths, labels, documents, meta, params)
--> 266     output = self.graph.nodes[n_id]["component"].run(**input_dict)
    267     inputs_for_join_node["inputs"].append(output)
    268 input_dict = inputs_for_join_node

TypeError: 'RayServeSyncHandle' object is not callable

Expected behavior The RayPipeline should work with any and all Haystack Pipelines, including parallel nodes.

Additional context We have discussed this with @ZanSara on Slack. We agreed that as part of a refactoring I will remove RayPipeline.run and make Pipeline.run to be usable from RayPipeline - which should fix this issue and any other potential issues hidden by using the customer RayPipeline.run instead of Pipeline.run.

To Reproduce Run RayPipeline with a standard BM25 & Embedding Retriever + Reader pipeline.

FAQ Check

System:

  • OS: Linux Mint 20.2
  • GPU/CPU:
  • Haystack version (commit or version number): 1.6.0
  • DocumentStore: Weaviate
  • Reader: deepset/roberta-base-squad2
  • Retriever: BM25 and EmbeddingRetriever (sentence-transformers/multi-qa-mpnet-base-dot-v1)

zoltan-fedor avatar Aug 04 '22 13:08 zoltan-fedor

As agreed with @ZanSara, I will be providing a PR for this

zoltan-fedor avatar Aug 04 '22 13:08 zoltan-fedor

Thanks for flagging this and volunteering a PR @zoltan-fedor 💪🏽

TuanaCelik avatar Aug 04 '22 14:08 TuanaCelik