dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Sensitivity of Signature attribute naming

Open ahoho opened this issue 1 year ago • 8 comments

I have been getting an odd bug. With the below Signature, I get an error (specifically, TemplateV2.query throws an AttributeError because it is trying to split a list?). But if I change examples to context, then it seems to work fine. Are certain attribute names protected?

import dspy

gpt35 = dspy.OpenAI(model="gpt-3.5-turbo")
dspy.settings.configure(lm=gpt35)

class GenerateSummary(dspy.Signature):
    """Generate a summary of the provided examples"""
    examples = dspy.InputField(desc="A balanced set of examples")
    summary = dspy.OutputField(desc="A straightforward summary")
    
generate_summary = dspy.ChainOfThought(GenerateSummary)
generate_summary(context=[
     "Humpty Dumpty is a character in an English nursery rhyme, probably originally a riddle and one of the best known in the English-speaking world.",
     "Ring Around the Rosie, is a nursery rhyme. Descriptions first emerge in the mid-19th century, but are reported as dating from decades before, and similar rhymes are known from across Europe"
])

ahoho avatar Sep 23 '23 04:09 ahoho

Ah that’s a known issue that we’ll be fixing. For now, pass a format keyword argument to dspy.InputField that takes a list and returns a formatted string.

import dsp dsp.passages2text

is one function that achieves that.

okhat avatar Sep 23 '23 04:09 okhat

Wow, what a quick response! Thanks!

My actual use case was slightly more complicated---I wanted to pass [ex.with_inputs('text', 'label') for ex in dataset]. Should that still work? I just tried it with my above workaround (using context instead), and I think they were ignored.

ahoho avatar Sep 23 '23 04:09 ahoho

There are no fields in your code called text and label. How are these suppose to be used

okhat avatar Sep 23 '23 04:09 okhat

Oh sorry, I wasn't clear, my real data is a list of Examples which have those fields. I guess the question is whether I should define a format function specific to my Examples?

ahoho avatar Sep 23 '23 04:09 ahoho

You may want to define a dspy.Module class and in the forward function take any argument names you like but pass examples into the chain of thought method.

Let me know if I should share an example of that

okhat avatar Sep 23 '23 04:09 okhat

Yeah, I suspect I'm doing something wrong, since I don't need to pass anything to forward (awaiting your paper so I can understand the abstractions better!). Right now my Module looks something like the following (in practice there's more going on, but I think this gets the main idea across)

class Summarizer(dspy.Module):
    def __init__(
        self,
        trainset: list[Example],
        num_iters: int = 4,
        items_per_sample: int = 10,
    ):
        super().__init__()
        self.trainset = trainset
        self.num_iters = num_iters
        self.items_per_sample = items_per_sample

        self.generate_summary = dspy.ChainOfThought(GenerateSummary)
        
    def forward(self):
        outputs = []
        for _ in self.num_iters:
            train_ex_sample = random.sample(self.trainset, k=self.items_per_sample)
            result = self.generate_summary(examples=train_ex_sample) # throws the error here
            outputs.append(result)
        return outputs

In the long run, I could see using simple retrieval to target diversity in the sampling.

ahoho avatar Sep 23 '23 05:09 ahoho

I'm not 100% clear on the reason for passing the training set into forward. I don't think you're trying to train the program (if you are, I'd use one of the teleprompters instead).

But this works fwiw:

import dspy
import random
from dsp import passages2text

trainset = [dspy.Example(text=f"my long string #{idx}", label=f"shorter string #{idx}") for idx in range(3)]
trainset = [x.with_inputs('text') for x in trainset]

devset = [dspy.Example(text=f"my long string #{idx}", label=f"shorter string #{idx}") for idx in range(3)]
devset = [x.with_inputs('text') for x in devset]

class GenerateSummary(dspy.Signature):
    """Generate a summary of the provided examples"""
    examples = dspy.InputField(desc="A balanced set of examples", format=passages2text)
    summary = dspy.OutputField(desc="A straightforward summary")

class Summarizer(dspy.Module):
    def __init__(
        self,
        trainset,
        num_iters: int = 2,
        items_per_sample: int = 2,
    ):
        super().__init__()
        self.trainset = trainset
        self.devset = devset
        self.num_iters = num_iters
        self.items_per_sample = items_per_sample

        # submodules
        self.generate_summary = dspy.ChainOfThought(GenerateSummary)
        
    def forward(self):
        outputs = []
        for _ in range(self.num_iters):
            train_ex_sample = random.sample(self.trainset, k=self.items_per_sample)
            result = self.generate_summary(examples=[x.text for x in train_ex_sample])
            outputs.append(result)
        return outputs

summarizer = Summarizer(trainset=trainset)
summarizer()

okhat avatar Sep 23 '23 05:09 okhat

Excellent, thank you so much! Yeah, I'm not training here. Passing the data directly to forward would probably be fine too.

ahoho avatar Sep 23 '23 05:09 ahoho