llama-stack icon indicating copy to clipboard operation
llama-stack copied to clipboard

Provide better control over the RAG ingestion stages (conversion, chunking, embedding, storing)

Open ilya-kolchinsky opened this issue 8 months ago • 8 comments

🚀 Describe the new functionality needed

As of now, the RAG ingestion documents chunks the documents using a trivial algorithm of overlapping chunks and converts PDFs (and PDFs only) using pypdf. The entire ingestion process should be made more general, flexible and user-controllable by introducing the respective configuration settings - similarly to the way the embedding model can be specified today via config.

💡 Why is this needed? What if we don't build it?

Having a higher degree of control over the ingestion process in an enabler to a wide range of customer use cases. To mention a few examples:

  • Structured chunking aware of the document format (e.g., .json, .md);
  • Controllable chunking granularity;
  • Defining document converters on a per format basis;
  • Support for sparse DBs by omitting the embedding stage.

Other thoughts

No response

ilya-kolchinsky avatar Feb 12 '25 10:02 ilya-kolchinsky