dspy
dspy copied to clipboard
Optimization dry runs [WIP]
Adding items discussed in #397
This is so cool! Thank you @smwitkowski — Will review tonight if I can, or tomorrow if not
Sounds good, @okhat - there's definitely still work that needs to be done on this, but I wanted to get a start on shaping this.
OK I've taken a look at this and I like it, thanks @smwitkowski .
I think what might be more useful than completing this draft right away is coming up with a short (English/psuedocode) plan of what the strategy for dry runs is.
Also let's not mix step=True with dryrun=True, as they're separate things I think(?).
There's a backend refactor going on by @CyrusOfEden so I think this dryrun plan can be merged after that for better longevity.
I guess the current dryrun plan seems to be mainly changing the Predict
module.
But it's unclear to me how that will connect to the bigger picture of optimizers.
Maybe we can have .dryrun(.) method in each optimizer? Or is it better to do .estimate_cost(.) ?
@okhat Sounds good - I agree there are a lot of moving pieces, and the link between optimizers and the Predict module could be fleshed out a bit more.
I'll revisit this and add some pseudocode for us to review.
@okhat - Taking a stab at a plan for the optimizer (I'm leaving out step
here).
TLDR: Create a function or context manager that simulates the LLM output and optimization step to remove any early stopping on a per-optimizer basis. Count the input and output tokens and estimate a "max" number of tokens used.
Objective
The aim is to simulate the number of tokens used during optimization without executing actual calls to the Language Model (LM). To do this, we'll:
- Mock the output of an LM to bypass actual LM calls, allowing us to estimate the token count without calling any APIs.
- Prevent early stopping in our optimization routines to ensure we know the most number of tokens to be used, enabling a "max cost" estimate.
Implementation Strategy
To achieve this, we'll implement a mock functionality encapsulated within a context manager or function within teleprompt.py.
This will enable us to:
- Mock LM Outputs: Create a placeholder for LM responses, facilitating the reuse of the optimizers as if they were interacting with a live and functional LM.
- Modularize Early Stopping Prevention: Initially apply a mock code to a set of an optimizer's functions to bypass their early stopping mechanisms.
Example Workflow
Defining Modules and Signatures
First, let's define a module RAG
that uses a mocked version of dspy.Module
for demonstration:
class GenerateAnswer(dspy.Signature):
"""Generates short factoid answers to questions based on provided context."""
context = dspy.InputField(desc="May contain relevant facts")
question = dspy.InputField()
answer = dspy.OutputField(desc="Typically between 1 and 5 words")
class RAG(dspy.Module):
def __init__(self, num_passages=3):
super().__init__()
self.retrieve = dspy.Retrieve(k=num_passages)
self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
def forward(self, question):
context = self.retrieve(question).passages
prediction = self.generate_answer(context=context, question=question)
return dspy.Prediction(context=context, answer=prediction.answer)`
Simulating Dry Runs
To simulate a dry run, we'll utilize a context manager dspy.dry_run()
that mocks the LM interactions:
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]
def validate_context_and_answer(example, pred, trace=None):
# Validation logic for context and answer accuracy
answer_EM = dspy.evaluate.answer_exact_match(example, pred)
answer_PM = dspy.evaluate.answer_passage_match(example, pred)
return answer_EM and answer_PM
teleprompter = BootstrapFewShot(metric=validate_context_and_answer)
with dspy.dry_run():
compile_dryrun = teleprompter.compile(RAG(), trainset=trainset)`
This approach ensures that we mock the LM's responses and bypass the early stopping mechanisms within the optimization routines.
Alternatively, dry_run
can be created as a class in teleprompt.py
, and it can be called simple as a method of teleprompter
, as shown below.
compiled_dryrun = teleprompter.dry_run(RAG(), trainset=trainset)
Context Manager for Dry Runs
A context manager dry_run
will facilitate the dry run. This can either (A) be used as a global context manager, or (B) it can be added to teleprompt.py,
and it would be called a function of the optimizer class.
@contextmanager
def dry_run(self):
original_lm = self.lm # Save the original LM
try:
self.lm = mock_llm(original_lm.provider) # Use a mocked LM
if self.optimizer_type == 'bootstrap':
self._mock_bootstrap()
# Additional conditions for other optimizers
yield
finally:
self.lm = original_lm # Restore the original LM
Mocking Logic for Optimizers
Each optimizer will have its specific mocking logic to simulate its optimization process without actual LM calls or early stopping. This simulation will estimate the maximum number of tokens and calls expected during the optimization.
def _mock_bootstrap():
pass
# Mocking methods for different optimizers...`
These will need to be built specifically for each optimizer.
Mocking Logic for LMs
Each LM will have a function that takes an input and returns some output without actually calling an LM.
def mock_llm(llm_provider)
# Returns a mocked LM based on the provider
if llm_provider == 'open_ai':
return mock_open_ai_llm
# Additional logic for different LLM providers...`
Many details are omitted here, but it gives a high-level idea of how we could approach this.