dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Optimization dry runs [WIP]

Open smwitkowski opened this issue 1 year ago • 6 comments

Adding items discussed in #397

smwitkowski avatar Feb 19 '24 12:02 smwitkowski

This is so cool! Thank you @smwitkowski — Will review tonight if I can, or tomorrow if not

okhat avatar Feb 19 '24 13:02 okhat

Sounds good, @okhat - there's definitely still work that needs to be done on this, but I wanted to get a start on shaping this.

smwitkowski avatar Feb 19 '24 13:02 smwitkowski

OK I've taken a look at this and I like it, thanks @smwitkowski .

I think what might be more useful than completing this draft right away is coming up with a short (English/psuedocode) plan of what the strategy for dry runs is.

Also let's not mix step=True with dryrun=True, as they're separate things I think(?).

There's a backend refactor going on by @CyrusOfEden so I think this dryrun plan can be merged after that for better longevity.

okhat avatar Feb 23 '24 13:02 okhat

I guess the current dryrun plan seems to be mainly changing the Predict module.

But it's unclear to me how that will connect to the bigger picture of optimizers.

Maybe we can have .dryrun(.) method in each optimizer? Or is it better to do .estimate_cost(.) ?

okhat avatar Feb 23 '24 13:02 okhat

@okhat Sounds good - I agree there are a lot of moving pieces, and the link between optimizers and the Predict module could be fleshed out a bit more.

I'll revisit this and add some pseudocode for us to review.

smwitkowski avatar Feb 23 '24 14:02 smwitkowski

@okhat - Taking a stab at a plan for the optimizer (I'm leaving out step here).

TLDR: Create a function or context manager that simulates the LLM output and optimization step to remove any early stopping on a per-optimizer basis. Count the input and output tokens and estimate a "max" number of tokens used.

Objective

The aim is to simulate the number of tokens used during optimization without executing actual calls to the Language Model (LM). To do this, we'll:

  1. Mock the output of an LM to bypass actual LM calls, allowing us to estimate the token count without calling any APIs.
  2. Prevent early stopping in our optimization routines to ensure we know the most number of tokens to be used, enabling a "max cost" estimate.

Implementation Strategy

To achieve this, we'll implement a mock functionality encapsulated within a context manager or function within teleprompt.py. This will enable us to:

  • Mock LM Outputs: Create a placeholder for LM responses, facilitating the reuse of the optimizers as if they were interacting with a live and functional LM.
  • Modularize Early Stopping Prevention: Initially apply a mock code to a set of an optimizer's functions to bypass their early stopping mechanisms.

Example Workflow

Defining Modules and Signatures

First, let's define a module RAG that uses a mocked version of dspy.Module for demonstration:

class GenerateAnswer(dspy.Signature):
    """Generates short factoid answers to questions based on provided context."""
    context = dspy.InputField(desc="May contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="Typically between 1 and 5 words")

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
    
    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)` 

Simulating Dry Runs

To simulate a dry run, we'll utilize a context manager dspy.dry_run() that mocks the LM interactions:

dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

def validate_context_and_answer(example, pred, trace=None):
    # Validation logic for context and answer accuracy
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM and answer_PM

teleprompter = BootstrapFewShot(metric=validate_context_and_answer)
with dspy.dry_run():
    compile_dryrun = teleprompter.compile(RAG(), trainset=trainset)` 

This approach ensures that we mock the LM's responses and bypass the early stopping mechanisms within the optimization routines.

Alternatively, dry_run can be created as a class in teleprompt.py, and it can be called simple as a method of teleprompter, as shown below.

compiled_dryrun = teleprompter.dry_run(RAG(), trainset=trainset)

Context Manager for Dry Runs

A context manager dry_run will facilitate the dry run. This can either (A) be used as a global context manager, or (B) it can be added to teleprompt.py, and it would be called a function of the optimizer class.


 @contextmanager
    def dry_run(self):
        original_lm = self.lm  # Save the original LM
        
        try:
            self.lm = mock_llm(original_lm.provider)  # Use a mocked LM
            if self.optimizer_type == 'bootstrap':
                self._mock_bootstrap()
            # Additional conditions for other optimizers
            yield
        finally:
            self.lm = original_lm  # Restore the original LM

Mocking Logic for Optimizers

Each optimizer will have its specific mocking logic to simulate its optimization process without actual LM calls or early stopping. This simulation will estimate the maximum number of tokens and calls expected during the optimization.

	def _mock_bootstrap():
		pass
	# Mocking methods for different optimizers...` 

These will need to be built specifically for each optimizer.

Mocking Logic for LMs

Each LM will have a function that takes an input and returns some output without actually calling an LM.

def mock_llm(llm_provider)
    # Returns a mocked LM based on the provider
    if llm_provider == 'open_ai':
        return mock_open_ai_llm
    # Additional logic for different LLM providers...` 

Many details are omitted here, but it gives a high-level idea of how we could approach this.

smwitkowski avatar Feb 28 '24 14:02 smwitkowski