dspy
dspy copied to clipboard
Self-Bootstrapping pipelines?
I was watching the llamaindex Webinar and Omar mentioned the very interesting ideas to self-generate questions/answer pairs from data, to avoid manually collecting examples. An auto pipeline for RAG would look like this:
- From a set of passages, use a model to generate questions that can be solved with content on the passage.
- Use this set of Question/Answers to bootstrap a pipeline
- Use a model, rather than an hardcoded metric to determine whether a proposed RAG answer is good enough comparing it with the auto-generated ground truth
What would be the best way to express this kind of pipeline in DSPy? In the webinar Omar says DSPy accepts other DSPy programs as metrics. Right now using the "exact match" metric I always produce 0 full traces, and it is not very clear from DSPy output what is going wrong.