Migrate Components to Pipeline v2

Open julian-risch opened this issue 2 years ago • 0 comments

We are working on Haystack 2.0, with a major refactoring of pipelines and components.

Rationale

We need to prioritize the list of components and separately the list of document stores to migrate to pipelines v2. Most risky components and components essential to most pipelines should be migrated first. Let's also collect feedback on what components are most relevant to Sol (@sjrl) to enable them to give feedback early on based on real use cases. Let's also use telemetry data to see what components are most important to the community.

Use cases

List of the usecases to support, in priority order, with a list of the bare minimum components required for them to work. Note: every pipeline needs the components of all the pipelines above it in priority order in order to work.

Each "component type" links to another small epic where the specific component is broken down into a set of requirements, which might eventually be covered by one or more v2 components.

1. Document Search

#5311
- #5312
https://github.com/deepset-ai/haystack/issues/5326

Note: Retrievers and Embedder's planning will follow the Docstore's, see < the other issue >

2. Generative QA & Agent Pipelines

https://github.com/deepset-ai/haystack/issues/5330

Non-blocking:

Web Retrievers

3. Extractive QA

Reader Components

4. Minimal Indexing

https://github.com/deepset-ai/haystack/issues/5339
https://github.com/deepset-ai/haystack/issues/5363
Document Preprocessing Components

6. General Indexing

https://github.com/deepset-ai/haystack/issues/5362
https://github.com/deepset-ai/haystack/issues/5367
https://github.com/deepset-ai/haystack/issues/5366

7. Advanced querying

Rankers

8. Other usecases

All other nodes

Agent Pipelines

Agent pipelines will need a bit of exploration to get right. I expect their main enabler to be the LLM component: any other unforeseen component that might be needed here will be prioritized accordingly.

Parallel and/or related tasks

#5341
Serialization/deserialization of Haystack pipelines
Port Haystack REST API v2 to latest Canals

### Tasks
- [ ] https://github.com/deepset-ai/haystack/issues/5341
- [ ] https://github.com/deepset-ai/haystack/issues/5342
- [ ] Port Haystack REST API v2 to latest Canals
- [ ] https://github.com/deepset-ai/haystack/issues/5311
- [ ] https://github.com/deepset-ai/haystack/issues/5326
- [ ] https://github.com/deepset-ai/haystack/issues/5330
- [ ] Agent Pipelines (v2)
- [ ] Reader Components (v2)
- [ ] Web Retrievers (v2)
- [ ] https://github.com/deepset-ai/haystack/issues/5339
- [ ] Document Preprocessing Components (v2)
- [ ] https://github.com/deepset-ai/haystack/issues/5362
- [ ] https://github.com/deepset-ai/haystack/issues/5367
- [ ] Ranker components (v2)
- [ ] https://github.com/deepset-ai/haystack/issues/5366

Developer relations efforts

Have initial demonstration and preview content on Pipeliens v2: articles, demos and videos
Set up the Haystack website to house both v1 and v2 content: tutorials, Integrations, articles

Context

Old roadmap item https://github.com/deepset-ai/haystack/issues/4390
Canals, a component orchestration engine by deepset
- https://deepset-ai.github.io/canals/
- https://github.com/deepset-ai/canals
- https://github.com/deepset-ai/haystack/blob/main/proposals/text/4370-documentstores-and-retrievers.md

Jul 04 '23 11:07 julian-risch