dspy icon indicating copy to clipboard operation
dspy copied to clipboard

How to change language in case of Synthesizer

Open VitalyKrenel opened this issue 5 months ago • 2 comments

Hey folks!

Question How to get Synthesizer to generate non English output?

Context I was playing around with DSPy Synthesizer experimental module. I recently have been building a product in Armenia where a Russian [cyrilic] is used quite a lot. So I'm trying to figure out, how to adjust the Synthesizer to return me non english.

Problem description:

  • I can't figure out a way to get non english output in the final syn_data

I get results written in Russian during feedback loop and I provide feedback on them in Russian — I still get English synthetic data output most of the time. Among 5-6 run, I only managed to get non english output like 0.5 times [half of the output was still in english]

What I tried so far: I provided a ground_source datasetp with many materials nomenclatures written in Russian:

# original dataset looks like this:
materials = pd.read_csv(io.StringIO('''
"Труба стальная ВГП 15х2,5 ст3 электросварная ГОСТ 8732-78"
"Обогреватель конвекторный электрический Ballu Solo BES/SM-1500 1,5кВт"
Конвектор электрический Ballu Transformer BEC/EVU-1500 1500Вт
"Обогреватель конвекторный электрический Ballu Solo BES/SM-1500 1,5кВт"
"Сетка штукатурная сварная оцинкованная 25х25мм d=0,8мм, рулон 1х25м"
...
'''))

# building examples dataset:
dataset = []

dataframe = materials;

# TODO: Add material_description later
for material_nomenclature in dataframe.values.flatten():
    dataset.append(dspy.Example(material_description=None, material_nomenclature=material_nomenclature).with_inputs('material_description'))

Then I use Synth like this:

config = SynthesizerArguments(
    feedback_mode="human",
    num_example_for_feedback=3,
    num_example_for_optim=10
)
synthesizer = Synthesizer(config=config)

syn_data = synthesizer.generate(
    ground_source=dataset[:50],
    num_data=10
)

Here's my full colab: https://colab.research.google.com/drive/1xbvpRKLMcnI91heO2S1EkZ-v0BR1byt5

I couldn't find any API reference on Synthesizer class in the Docs

VitalyKrenel avatar Sep 22 '24 18:09 VitalyKrenel