haystack feat: Add PipelineTemplate for ready made pipelines

Why:

The introduction of the PipelineTemplate class along with associated template files in the Haystack project marks a significant enhancement in the flexibility and simplicity of creating, configuring, and using NLP pipelines. It addresses the need for a more dynamic approach to building pipelines that can cater to a wide range of NLP tasks, such as question answering, document indexing, and beyond. The use of Jinja2 templated YAML files for pipeline construction empowers users to effortlessly tailor pipelines to their specific requirements, either by using default configurations or by customizing components as needed.

fixes https://github.com/deepset-ai/haystack/issues/5992

What:

Introduction of the PipelineTemplate class: A comprehensive solution for constructing and customizing NLP pipelines based on Jinja2 templated YAML files. It simplifies the process of specifying, building, and overwriting components in pipelines.
Addition of templated YAML files: Specific YAML files have been added for tasks like indexing (indexing.yaml.jinja2), question answering (qa.yaml.jinja2), and retriever-augmented generation (RAG) (rag.yaml.jinja2). These templates define the structure and components of pipelines for their respective NLP tasks, and they are designed to be easily customizable.
Release notes documentation: A detailed explanation and examples demonstrating the usage and benefits of the new PipelineTemplate feature.

How can it be used:

Building custom pipelines effortlessly:

For indexing documents, including the option to convert PDF files to text:

from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack.templates import PipelineTemplate

pt = PipelineTemplate("indexing", template_params={"use_pdf_file_converter": True})
pt.override("embedder", SentenceTransformersDocumentEmbedder(progress_bar=True))
pipe = pt.build()

Easy customization for various NLP tasks:

Creating a question answering pipeline that answers questions in German:

from haystack.templates import PipelineTemplate

qa_pipe = PipelineTemplate("qa").build()
print(qa_pipe.run({"question": "What is the capital of Germany?"}))

How did you test it:

Unit and integration tests:
- Testing for invalid template names, ensuring exceptions are raised as expected.
- Overriding default components with custom settings and validating the applied changes.
- Building pipelines using both predefined and custom templates to verify functionality.
Examples in documentation:
- Running example code provided in the release notes and in docstrings to ensure accuracy and functionality.

Notes for the reviewer:

The addition of the PipelineTemplate feature introduces a significant paradigm shift in how users can interact with and customize NLP pipelines in Haystack.
Reviewers should pay special attention to the integration and flexibility this feature offers, especially in terms of how easily one can override default components or integrate new ones.
The changes include not only the core implementation in pipelines.py but also the addition of Jinja2 templated YAML files for specific NLP tasks. Verification that these templates meet the project's standards for flexibility and usability might be important.
The unit and integration tests provided should cover a wide range of scenarios, but checking for any edge cases or potential improvements in testing coverage could be beneficial.

Feb 15 '24 12:02 vblagoje

Pull Request Test Coverage Report for Build 7934015716

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
21 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.3%) to 89.312%

Files with Coverage Reduction	New Missed Lines	%
core/pipeline/pipeline.py	3	94.52%
components/audio/whisper_local.py	6	91.3%
components/embedders/openai_text_embedder.py	6	70.59%
utils/auth.py	6	93.27%
<!--	Total:	21

Totals
Change from base Build 7904082418:	0.3%
Covered Lines:	5047
Relevant Lines:	5651

💛 - Coveralls

Feb 15 '24 12:02 coveralls

@shadeMe I think this is the solution we had in mind. LMK

Feb 16 '24 12:02 vblagoje

haystack haystack copied to clipboard

feat: Add PipelineTemplate for ready made pipelines

Why:

What:

How can it be used:

How did you test it:

Notes for the reviewer:

Pull Request Test Coverage Report for Build 7934015716

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

haystack
haystack copied to clipboard