dspy
dspy copied to clipboard
Dataset destroys Example.input_keys values
Minimal example (on dspy v2.4.0):
import dspy
examples = [dspy.Example(foo=f, bar=b).with_inputs("foo") for f, b in zip("abcd", "1234")]
print(examples) # [Example({'foo': 'a', 'bar': '1'}) (input_keys={'foo'}), Example({'foo': 'b', 'bar': '2'}) (input_keys={'foo'}), Example({'foo': 'c', 'bar': '3'}) (input_keys={'foo'}), Example({'foo': 'd', 'bar': '4'}) (input_keys={'foo'})]
from dspy.datasets.dataset import Dataset
class MyDataset(Dataset):
def __init__(self, examples):
super().__init__(train_size=1, dev_size=1, test_size=1)
self._train = [examples[0]]
self._dev = [examples[1]]
self._test = [examples[2]]
dataset = MyDataset(examples)
print(dataset.train) # [Example({'foo': 'a', 'bar': '1'}) (input_keys=None)]
print(dataset.dev) # [Example({'foo': 'b, 'bar': '2'}) (input_keys=None)]
print(dataset.test) # [Example({'foo': 'c', 'bar': '3'}) (input_keys=None)]
Expected to have the input_keys persist through the Dataset object. This line seems to be the problem.
Hi @jsleight , thanks for raising this. Currently, the behavior lies in declaring your Dataset type first and then setting the inputs - example from intro.ipynb:
from dspy.datasets import HotPotQA
# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)
# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]
len(trainset), len(devset)
but it does make sense to me to have input_keys()
persist if they exist. Feel free to push a PR for this change!
I might have some time to make a PR. I can envision a couple of approaches so interested to see which you'd prefer.
- Just change the line in Dataset that creates copies of the examples to also do
with_inputs
. - A bit more fundamental change to Examples to have
Examples(**example)
persist the input_keys. Would make the Dataset class persist the input_keys while adding a bit more functionality to the Examples class. But idk if you'd like Examples to work this way or not.