haystack
haystack copied to clipboard
create ready-made pipelines
Ready made pipelines
Similar to https://github.com/deepset-ai/haystack/blob/main/haystack/pipelines/standard_pipelines.py we want to have predefined pipelines in haystack 2.0. We start with:
RAG pipeline
We want a simple RAG pipeline. That means one retriever + a Generator. Make the embedding model optional. If it is None BM25 will be used.
- Construction params: Embedding model, prompt="default_rag_prompt", generation model.
- Run params: query
indexing pipeline
indexing with the help of native + OSS haystack converters. We want to customize this pipeline on the number of supported file formats. This will make installation easier depending on which file types we want to convert. E.g. we can showcase an indexing pipeline that just converts TXT without additional dependencies. Also, we want this pipeline to convert a list of file_paths (already supported through the filetyperouter) and ideally all files present in a folder (I believe this needs a new component). It should write warnings for files it cannot convert.
- Construction params: supported_file_types=["PDF", "TXT", "markdown", "HTML"], embedding model.
- Run params: either [list of files] or "folder".
We want to think about the most important parameters for these 2 pipeline types. Which LLM (+ API key) is being used are two of those parameters, but you should define and implement 1-4 most important parameters in total.
Closing the gap from Simple to Complex
Please think about ways to gap the simple representation to a more complex + customizable one. That means we find ways to transition from using the ready-made RAGPipline() to the underlying components in an easy-to-use and understandable way. This should be done via documentation AND inside the code.
Please assign me, I want to do it.
hey thanks for working on this @CrypticRevenger I wrote a response in the PR you opened: https://github.com/deepset-ai/haystack/pull/5996