haystack icon indicating copy to clipboard operation
haystack copied to clipboard

feat: Add PipelineTemplate for ready made pipelines

Open vblagoje opened this issue 1 year ago • 2 comments

Why:

The introduction of the PipelineTemplate class along with associated template files in the Haystack project marks a significant enhancement in the flexibility and simplicity of creating, configuring, and using NLP pipelines. It addresses the need for a more dynamic approach to building pipelines that can cater to a wide range of NLP tasks, such as question answering, document indexing, and beyond. The use of Jinja2 templated YAML files for pipeline construction empowers users to effortlessly tailor pipelines to their specific requirements, either by using default configurations or by customizing components as needed.

  • fixes https://github.com/deepset-ai/haystack/issues/5992

What:

  • Introduction of the PipelineTemplate class: A comprehensive solution for constructing and customizing NLP pipelines based on Jinja2 templated YAML files. It simplifies the process of specifying, building, and overwriting components in pipelines.
  • Addition of templated YAML files: Specific YAML files have been added for tasks like indexing (indexing.yaml.jinja2), question answering (qa.yaml.jinja2), and retriever-augmented generation (RAG) (rag.yaml.jinja2). These templates define the structure and components of pipelines for their respective NLP tasks, and they are designed to be easily customizable.
  • Release notes documentation: A detailed explanation and examples demonstrating the usage and benefits of the new PipelineTemplate feature.

How can it be used:

  • Building custom pipelines effortlessly:
    • For indexing documents, including the option to convert PDF files to text:
      from haystack.components.embedders import SentenceTransformersDocumentEmbedder
      from haystack.templates import PipelineTemplate
      
      pt = PipelineTemplate("indexing", template_params={"use_pdf_file_converter": True})
      pt.override("embedder", SentenceTransformersDocumentEmbedder(progress_bar=True))
      pipe = pt.build()
      
  • Easy customization for various NLP tasks:
    • Creating a question answering pipeline that answers questions in German:
      from haystack.templates import PipelineTemplate
      
      qa_pipe = PipelineTemplate("qa").build()
      print(qa_pipe.run({"question": "What is the capital of Germany?"}))
      

How did you test it:

  • Unit and integration tests:
    • Testing for invalid template names, ensuring exceptions are raised as expected.
    • Overriding default components with custom settings and validating the applied changes.
    • Building pipelines using both predefined and custom templates to verify functionality.
  • Examples in documentation:
    • Running example code provided in the release notes and in docstrings to ensure accuracy and functionality.

Notes for the reviewer:

  • The addition of the PipelineTemplate feature introduces a significant paradigm shift in how users can interact with and customize NLP pipelines in Haystack.
  • Reviewers should pay special attention to the integration and flexibility this feature offers, especially in terms of how easily one can override default components or integrate new ones.
  • The changes include not only the core implementation in pipelines.py but also the addition of Jinja2 templated YAML files for specific NLP tasks. Verification that these templates meet the project's standards for flexibility and usability might be important.
  • The unit and integration tests provided should cover a wide range of scenarios, but checking for any edge cases or potential improvements in testing coverage could be beneficial.

vblagoje avatar Feb 15 '24 12:02 vblagoje

Pull Request Test Coverage Report for Build 7934015716

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 21 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.3%) to 89.312%

Files with Coverage Reduction New Missed Lines %
core/pipeline/pipeline.py 3 94.52%
components/audio/whisper_local.py 6 91.3%
components/embedders/openai_text_embedder.py 6 70.59%
utils/auth.py 6 93.27%
<!-- Total: 21
Totals Coverage Status
Change from base Build 7904082418: 0.3%
Covered Lines: 5047
Relevant Lines: 5651

💛 - Coveralls

coveralls avatar Feb 15 '24 12:02 coveralls

@shadeMe I think this is the solution we had in mind. LMK

vblagoje avatar Feb 16 '24 12:02 vblagoje