ragas How to evaluate rags in other language

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question How to evaluate rags in other language.

The dataset has been generated in English
It can be evaluated in English
What if I do need a different language

like code examples. If now I need support like French. but the input and output are same meaning, just change language. How can I do that?

Code Examples `dataset = [] dataset.append( { "user_input":'how many devices added yesterday?', "retrieved_contexts":["Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity."], "response": "This is a topic I can't help with.\n \n You can ask questions like", "reference":"Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity." } ) from ragas import evaluate from ragas.llms import LangchainLLMWrapper from ragas.llms import llm_factory

evaluator_llm = LangchainLLMWrapper(azure) from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, AnswerCorrectness, answer_correctness from ragas import EvaluationDataset evaluation_dataset = EvaluationDataset.from_list(dataset)

result = evaluate(dataset=evaluation_dataset, metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), AnswerCorrectness()], llm=evaluator_llm, embeddings=silcom_embedding) print(result)`

Additional context Anything else you want to share with us?

Mar 05 '25 09:03 landhu

I’m not sure exactly what you’re trying to do, but if you want to evaluate an English dataset and obtain results in other languages, you may check out this page.

However, if you’re looking to generate a dataset in a non-English language, you can refer to this page.

Mar 07 '25 02:03 jongwoo328

Hi @landhu,

Were you able to evaluate in French using the approach mentioned? Thank you, @Jongwoo328, for your help on this!

Mar 12 '25 02:03 sahusiddharth

Hello! I was just struggling for the same reason, while trying to adapt the Llama-Index example to adapt the end-2-end solution based on the generate and evaluate functions to create the dataset and evaluate it in few lines of code. Unfortunately, these aggregated functions do not support the language conversion, so we need to provide them the transforms and query_distribution arguments as you suggest in the Non-English Testset Generation example.

I tried the example and found few issues:

We also need to adapt the prompt for the transforms, otherwise the testset will be generated in English. Also, pay attention that you may have Parallel transformations, for which we need to apply the adapted prompt to the inner transformations:

LANG = "french"
for t in transforms:
    if isinstance(t, PromptMixin):
        prompts = await t.adapt_prompts(
            language=LANG, llm=generator.llm, adapt_instruction=True
        )
        t.set_prompts(**prompts)
    elif isinstance(t, Parallel):
      for subt in t.transformations:
        if isinstance(subt, PromptMixin):
            prompts = await subt.adapt_prompts(
                language=LANG, llm=generator.llm, adapt_instruction=True
            )
            subt.set_prompts(**prompts)

I noticed that the example page tries to adapt the prompts to "spanish", but it should be changed to "french" instead.
Finally, the Persona configured in the example is named curious student, but the engine expects a Curious Student instead, so it fails. Better to skip this step at all and do not pass any persona to the TestsetGenerator (or, use the from_langchain static function instead).

Can we open a separate issue to request language support to the generate and evaluate functions?

Mar 13 '25 18:03 dmartinol