unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

Suggestion: More informed approach to multi-linguality in Unitxt

Open elronbandel opened this issue 1 year ago • 0 comments

Currently a specific template in unitxt have few versions for different languages.

For example: English sentiment template:

template = InputOutputTemplate(input_format="Classify the sentiment of this text: {text}")

Deutch sentiment template:

template = InputOutputTemplate(input_format="Classificeer het sentiment van deze tekst: {text}")

The issue is that we have many templates for different languages that logically say the same thing, moreover, we need also formats for each language and trust our users to change all the different aspects of the recipe to the right artifact with the correct language. I want to suggest a simple solution that will enable to give the recipe and argument language=deutch and the adjusment of the template format etc will be done automatically.

My suggestion is to create a new class MultiString that have strings for different languages:

input_format = MultiString(
     english="Classify the sentiment of this text: {text}",
     deutch="Classificeer het sentiment van deze tekst: {text}",
    )
 template = InputOutputTemplate(input_format=input_format)

And lastly the usage will be with a context manager:

with set_language("deutch",  when_not_exist="english"):
       # here is the code that will be affected

And everything within that context manager will use the requested language set up in the MultiString.

This will allow us to add a general variable to unitxt recipe prompting_language=english.

elronbandel avatar Apr 10 '24 09:04 elronbandel