dspy
dspy copied to clipboard
Synthetic Data Generation v2
Usage: [1] Signature Based generation:
import dsp
from dspy.datasets import DataLoader
class GenerateAnswer(dspy.Signature):
"""Answer questions with short factoid answers."""
context = dspy.InputField(desc="may contain relevant facts")
question = dspy.InputField()
answer = dspy.OutputField(desc="often between 1 and 5 words")
synthesizer = Synthesizer()
syn_data = synthesizer.generate(GenerateAnswer, num_data=10)
synthesizer.export(data=syn_data, path="syn_datacsv")
[2] Batched Generation(Faster):
from dspy.datasets import DataLoader
dl = DataLoader()
data = dl.from_huggingface(
"gsm8k", "main",
fields=("question", "answer"),
input_keys=("question",),
split="train[:8]"
)
synthesizer = Synthesizer()
syn_data = synthesizer.generate(data, num_data=10, batch_size = 10)
synthesizer.export(data=syn_data, path="syn_data.json")
Thanks a lot for the suggestions and fixes!!
@krypticmouse would it be possible to add this to the vercel documnetation? Also what does refer to? [4] Tweakable LM for input and output gen and support for module based output generation
[5] Example based input generation optimization
Fix potential import bug in dspy.experimental module
The code may contain a bug in the import statement within /dspy/experimental/init.py. The original import statement is: from module_graph import *
This has been changed to: from .module_graph import *