haystack
haystack copied to clipboard
feat: Add PipelineTemplate for ready made pipelines
Why:
The introduction of the PipelineTemplate
class along with associated template files in the Haystack project marks a significant enhancement in the flexibility and simplicity of creating, configuring, and using NLP pipelines. It addresses the need for a more dynamic approach to building pipelines that can cater to a wide range of NLP tasks, such as question answering, document indexing, and beyond. The use of Jinja2 templated YAML files for pipeline construction empowers users to effortlessly tailor pipelines to their specific requirements, either by using default configurations or by customizing components as needed.
- fixes https://github.com/deepset-ai/haystack/issues/5992
What:
-
Introduction of the
PipelineTemplate
class: A comprehensive solution for constructing and customizing NLP pipelines based on Jinja2 templated YAML files. It simplifies the process of specifying, building, and overwriting components in pipelines. -
Addition of templated YAML files: Specific YAML files have been added for tasks like indexing (
indexing.yaml.jinja2
), question answering (qa.yaml.jinja2
), and retriever-augmented generation (RAG) (rag.yaml.jinja2
). These templates define the structure and components of pipelines for their respective NLP tasks, and they are designed to be easily customizable. -
Release notes documentation: A detailed explanation and examples demonstrating the usage and benefits of the new
PipelineTemplate
feature.
How can it be used:
-
Building custom pipelines effortlessly:
- For indexing documents, including the option to convert PDF files to text:
from haystack.components.embedders import SentenceTransformersDocumentEmbedder from haystack.templates import PipelineTemplate pt = PipelineTemplate("indexing", template_params={"use_pdf_file_converter": True}) pt.override("embedder", SentenceTransformersDocumentEmbedder(progress_bar=True)) pipe = pt.build()
- For indexing documents, including the option to convert PDF files to text:
-
Easy customization for various NLP tasks:
- Creating a question answering pipeline that answers questions in German:
from haystack.templates import PipelineTemplate qa_pipe = PipelineTemplate("qa").build() print(qa_pipe.run({"question": "What is the capital of Germany?"}))
- Creating a question answering pipeline that answers questions in German:
How did you test it:
-
Unit and integration tests:
- Testing for invalid template names, ensuring exceptions are raised as expected.
- Overriding default components with custom settings and validating the applied changes.
- Building pipelines using both predefined and custom templates to verify functionality.
-
Examples in documentation:
- Running example code provided in the release notes and in docstrings to ensure accuracy and functionality.
Notes for the reviewer:
- The addition of the
PipelineTemplate
feature introduces a significant paradigm shift in how users can interact with and customize NLP pipelines in Haystack. - Reviewers should pay special attention to the integration and flexibility this feature offers, especially in terms of how easily one can override default components or integrate new ones.
- The changes include not only the core implementation in
pipelines.py
but also the addition of Jinja2 templated YAML files for specific NLP tasks. Verification that these templates meet the project's standards for flexibility and usability might be important. - The unit and integration tests provided should cover a wide range of scenarios, but checking for any edge cases or potential improvements in testing coverage could be beneficial.
Pull Request Test Coverage Report for Build 7934015716
Warning: This coverage report may be inaccurate.
This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
- For more information on this, see Tracking coverage changes with pull request builds.
- To avoid this issue with future PRs, see these Recommended CI Configurations.
- For a quick fix, rebase this PR at GitHub. Your next report should be accurate.
Details
- 0 of 0 changed or added relevant lines in 0 files are covered.
- 21 unchanged lines in 4 files lost coverage.
- Overall coverage increased (+0.3%) to 89.312%
Files with Coverage Reduction | New Missed Lines | % |
---|---|---|
core/pipeline/pipeline.py | 3 | 94.52% |
components/audio/whisper_local.py | 6 | 91.3% |
components/embedders/openai_text_embedder.py | 6 | 70.59% |
utils/auth.py | 6 | 93.27% |
<!-- | Total: | 21 |
Totals | |
---|---|
Change from base Build 7904082418: | 0.3% |
Covered Lines: | 5047 |
Relevant Lines: | 5651 |
💛 - Coveralls
@shadeMe I think this is the solution we had in mind. LMK