dspy icon indicating copy to clipboard operation
dspy copied to clipboard

dspy.Predict should be a dspy.Module

Open thomasahle opened this issue 1 year ago • 9 comments

For most purposes dspy.Predict behaves the same was as a dspy.Module. But if you try to pass a Predict directly to an optimizer, you'll notice that it's lacking a lot of (simple) methods that Module has.

Writing unittests I've often found myself writing unnecessary classes like

class SimpleModule(dspy.Module):
    def __init__(self, signature):
        super().__init__()
        self.predictor = dspy.Predict(signature)

    def forward(self, **kwargs):
        return self.predictor(**kwargs)

thomasahle avatar Feb 11 '24 04:02 thomasahle

Yeah :/ Good point. Same for ChainOfThought.

okhat avatar Feb 11 '24 04:02 okhat

I wonder if we can actually just resolve this by making a shallow wrapper and renaming the current thing to CorePredict and CoreChainOfThought?

okhat avatar Feb 11 '24 04:02 okhat

This is less of an engineering thing and more a programming language theory thing, but I've been thinking about what category Predict, ChainofThought, etc. falls under. I think there may be a missing category in the metamodel, which I've been calling a 'strategy' in my head: a module that returns a module. Conceptually, this opens the door to strategy optimisation (optimising the module that returns the module separately to the final signature) but the main benefit for me is just allowing us to reason about higher-order functions (important with functorial things like lists). I can imagine strategies for handling mapping on lists, a tree of thought one, a graph of thought one, or even ones that add MemGPT/Self-RAG support to another strategy.

jgeldart avatar Feb 13 '24 09:02 jgeldart

As I understand DSPy, a "program" is a "module" composed of other modules, such as "Predictors" (ChainOfThought/Predict/ReAct...), "Retrievers" or other "Subprograms". But, if we are going to categorize "Predictors" differently, I think, I would call them prompting "Techniques"

neoxelox avatar Feb 13 '24 09:02 neoxelox

I think it's good to just try and follow pytorch on this. There a nn.Sequence is still an nn.Module even though it takes a list of modules.

Maybe the current predict code could be moved to a function that the predict module calls? A bit like your Core Predict idea @okhat

thomasahle avatar Feb 13 '24 16:02 thomasahle

Regarding ChainOfThought, it seems like we could just replace it with

class ChainOfThought(Module):
    def __init__(self, signature, rationale_type=None, **config):
        super().__init__(**config)

        signature = ensure_signature(signature)
        *_keys, last_key = signature.output_fields.keys()

        rationale_type = rationale_type or dspy.OutputField(
            prefix="Reasoning: Let's think step by step in order to",
            desc="${produce the " + last_key + "}. We ...",
        )

        self.extended_signature = signature.prepend("rationale", rationale_type, type_=str)
        self.predict = dspy.Predict(self.extended_signature)
    
    def forward(self, **kwargs):
        return self.predict(**kwargs)

This still passes all my tests, except those for the (bayesian) signature optimizer, which has some hacks regarding extended_signatures.

thomasahle avatar Feb 13 '24 17:02 thomasahle

Or I guess a CorePredictor would be nice, as you say, since it serves as a place to "store signatures", so they can be changed, while keeping the Signature class itself immutable. E.g. in the Signature optimizer:

 # Go through our module's predictors
  for p_i, (p_old, p_new) in enumerate(zip(module.predictors(), module_clone.predictors())):
      candidates_ = latest_candidates[id(p_old)] # Use the most recently generated candidates for evaluation 
      if len(module.predictors()) > 1:
          candidates_ = all_candidates[id(p_old)] # Unless our program has multiple predictors, in which case we need to reevaluate all prompts with the new prompt(s) for the other predictor(s)   

      # For each candidate
      for c_i, c in enumerate(candidates_):                    
          # Get the candidate instruction and prefix 
          instruction, prefix = c.proposed_instruction.strip('"').strip(), c.proposed_prefix_for_output_field.strip('"').strip()

          # Set this new module with our instruction / prefix 
          if (hasattr(p_new, 'extended_signature')):
              *_, last_key = p_new.extended_signature.fields.keys()
              p_new.extended_signature = p_new.extended_signature \
                  .with_instructions(instruction) \
                  .with_updated_fields(last_key, prefix=prefix)
          else:
              *_, last_key = p_new.extended_signature1.fields.keys()
              p_new.extended_signature1 = p_new.extended_signature1 \
                  .with_instructions(instruction) \
                  .with_updated_fields(last_key, prefix=prefix)
              *_, last_key = p_new.extended_signature2.fields.keys()
              p_new.extended_signature2 = p_new.extended_signature2 \
                  .with_instructions(instruction) \
                  .with_updated_fields(last_key, prefix=prefix)

If we refactor this, we should be sure to find a way to avoid the two cases of extended_signature vs extended_signature1 and extended_signature2.

thomasahle avatar Feb 13 '24 18:02 thomasahle

@thomasahle I think the CorePredict will have self.instructions and self.demos, instead of any kind of changes to self.signature. Once a module is created (including CorePredict) the signature will never be changed --- that's my current thinking at least, I hope it's possible to realize in practice.

okhat avatar Feb 17 '24 04:02 okhat

Doesn't signature-optimizer also change the field descriptions and prefixes though?

thomasahle avatar Feb 20 '24 00:02 thomasahle