dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Bootstrapped 0 full traces

Open SrikarNamburu opened this issue 4 months ago • 8 comments

I tried to compile a program from the intro.ipynb with custom data with GPT3.5 and custom retriever module. Tried both Single-hop and Multi-hop, but 0 traces were formed. Although when I tried with hotpot q/a dataset with the above mentioned lm and rm modules, few traces were formed. How can I resolve this issue?

lm = dspy.AzureOpenAI(api_base='',api_key="" , api_version='', model='gpt35')
rm = custom_rm(index_name='', k=10)
dspy.settings.configure(lm=lm, rm=rm)

class GenerateAnswer(dspy.Signature):
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="be descriptive, provide most accurate answer based on context")

class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""
    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()


class SingleHopRAG(dspy.Module):
    def __init__(self, num_passages=10):
        super().__init__()

        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

class MultiHopRAG(dspy.Module):
    def __init__(self, passages_per_hop=5, max_hops=3):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops
    
    def forward(self, question):
        context = []

        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)


#Single Hop RAG(Basic RAG) with Fewshot
from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

# Set up a basic teleprompter, which will compile our RAG program.
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)

# Compile!
compiled_rag = teleprompter.compile(SingleHopRAG(), trainset=trainset)


def validate_context_and_answer_and_hops(example, pred, trace=None):
    if not dspy.evaluate.answer_exact_match(example, pred): return False
    if not dspy.evaluate.answer_passage_match(example, pred): return False

    hops = [example.question] + [outputs.query for *_, outputs in trace if 'query' in outputs]

    if max([len(h) for h in hops]) > 100: return False
    if any(dspy.evaluate.answer_exact_match_str(hops[idx], hops[:idx], frac=0.8) for idx in range(2, len(hops))): return False

    return True

#Multi hop RAG with Few shot

teleprompter = BootstrapFewShot(metric=validate_context_and_answer_and_hops)
compiled_baleen = teleprompter.compile(MultiHopRAG(), teacher=MultiHopRAG(passages_per_hop=2), trainset=trainset)

I get this for both SingleHopRAG and MultiHopRAG - Bootstrapped 0 full traces after 25 examples in round 0.

Is it because of the smaller trainset? It formed few traces when I tried with a similar size using hotpot q/a dataset. Moreover, when I try the prediction using MultiHopRAG on devset, the responses are slightly better than usual eventhough 0 traces were formed.

What could be the issue? And how can I resolve it?

Any help would be greatly appreciated!

SrikarNamburu avatar Sep 30 '24 17:09 SrikarNamburu