How to evaluate rags in other language
[ ] I checked the documentation and related resources and couldn't find an answer to my question.
Your Question How to evaluate rags in other language.
- The dataset has been generated in English
- It can be evaluated in English
- What if I do need a different language
like code examples. If now I need support like French. but the input and output are same meaning, just change language. How can I do that?
Code Examples `dataset = [] dataset.append( { "user_input":'how many devices added yesterday?', "retrieved_contexts":["Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity."], "response": "This is a topic I can't help with.\n \n You can ask questions like", "reference":"Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity." } ) from ragas import evaluate from ragas.llms import LangchainLLMWrapper from ragas.llms import llm_factory
evaluator_llm = LangchainLLMWrapper(azure) from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness, AnswerCorrectness, answer_correctness from ragas import EvaluationDataset evaluation_dataset = EvaluationDataset.from_list(dataset)
result = evaluate(dataset=evaluation_dataset, metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness(), AnswerCorrectness()], llm=evaluator_llm, embeddings=silcom_embedding) print(result)`
Additional context Anything else you want to share with us?
I’m not sure exactly what you’re trying to do, but if you want to evaluate an English dataset and obtain results in other languages, you may check out this page.
However, if you’re looking to generate a dataset in a non-English language, you can refer to this page.
Hi @landhu,
Were you able to evaluate in French using the approach mentioned? Thank you, @Jongwoo328, for your help on this!
Hello!
I was just struggling for the same reason, while trying to adapt the Llama-Index example to adapt the end-2-end solution based on the generate and evaluate functions to create the dataset and evaluate it in few lines of code.
Unfortunately, these aggregated functions do not support the language conversion, so we need to provide them the transforms and query_distribution arguments as you suggest in the Non-English Testset Generation example.
I tried the example and found few issues:
- We also need to adapt the prompt for the
transforms, otherwise the testset will be generated in English. Also, pay attention that you may haveParalleltransformations, for which we need to apply the adapted prompt to the inner transformations:
LANG = "french"
for t in transforms:
if isinstance(t, PromptMixin):
prompts = await t.adapt_prompts(
language=LANG, llm=generator.llm, adapt_instruction=True
)
t.set_prompts(**prompts)
elif isinstance(t, Parallel):
for subt in t.transformations:
if isinstance(subt, PromptMixin):
prompts = await subt.adapt_prompts(
language=LANG, llm=generator.llm, adapt_instruction=True
)
subt.set_prompts(**prompts)
- I noticed that the example page tries to adapt the prompts to "spanish", but it should be changed to "french" instead.
- Finally, the Persona configured in the example is named
curious student, but the engine expects aCurious Studentinstead, so it fails. Better to skip this step at all and do not pass any persona to theTestsetGenerator(or, use thefrom_langchainstatic function instead).
Can we open a separate issue to request language support to the generate and evaluate functions?