dspy icon indicating copy to clipboard operation
dspy copied to clipboard

Synthetic Data Generation v2

Open krypticmouse opened this issue 11 months ago • 1 comments

Usage: [1] Signature Based generation:

import dsp
from dspy.datasets import DataLoader

class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

synthesizer = Synthesizer()

syn_data = synthesizer.generate(GenerateAnswer, num_data=10)

synthesizer.export(data=syn_data, path="syn_datacsv")

[2] Batched Generation(Faster):

from dspy.datasets import DataLoader

dl = DataLoader()

data = dl.from_huggingface(
    "gsm8k", "main",
    fields=("question", "answer"),
    input_keys=("question",),
    split="train[:8]"
)

synthesizer = Synthesizer()

syn_data = synthesizer.generate(data, num_data=10, batch_size = 10)

synthesizer.export(data=syn_data, path="syn_data.json")

krypticmouse avatar Mar 04 '24 14:03 krypticmouse

Thanks a lot for the suggestions and fixes!!

krypticmouse avatar Mar 05 '24 18:03 krypticmouse

@krypticmouse would it be possible to add this to the vercel documnetation? Also what does refer to? [4] Tweakable LM for input and output gen and support for module based output generation

[5] Example based input generation optimization

chiragshah285 avatar Mar 30 '24 14:03 chiragshah285

Fix potential import bug in dspy.experimental module

The code may contain a bug in the import statement within /dspy/experimental/init.py. The original import statement is: from module_graph import *

This has been changed to: from .module_graph import *

beltrewilton avatar Jul 12 '24 03:07 beltrewilton