ragas Automatic language adaption TestSet generation error: "Adapted output keys do not match with the original output keys"

Describe the bug Can't get the automatic language adaption going for testset generation. I retried this about 10 times.

Ragas version: 0.1.4 Python version: 3.11

Code to Reproduce Share code to reproduce the issue

from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context,conditional
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

# generator with openai models
generator_llm = azure_llm()
critic_llm = azure_llm()
embeddings = azure_embeddings()

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

# adapt to language
language = "Dutch"
cache_dir = ".cache"

generator.adapt(language, evolutions=[simple], cache_dir=cache_dir)
generator.save(evolutions=[simple, reasoning, multi_context, conditional], cache_dir=cache_dir)

This is the output until it errors out:

{'keyphrases': ['Zwart gat', 'Regio van ruimtetijd', 'Sterke zwaartekracht', 'Licht en elektromagnetische golven', 'Theorie van algemene relativiteit']}
{'keyphrases': ['Chinese Muur', 'Oude vestingwerken', 'Noord-China']}
{'answer': 'Menselijke activiteiten dragen voornamelijk bij aan klimaatverandering door de uitstoot van broeikasgassen bij het verbranden van fossiele brandstoffen. Deze uitstoot verhoogt de concentratie van broeikasgassen in de atmosfeer, wat meer warmte vasthoudt en leidt tot opwarming van de aarde en veranderde weerspatronen.', 'verdict': '1'}
{'answer': 'Kunstmatige intelligentie is ontworpen om menselijke cognitieve functies na te bootsen, met belangrijke capaciteiten zoals leren, redeneren, waarnemen en reageren op de omgeving op een manier die vergelijkbaar is met mensen. Deze capaciteiten maken AI cruciaal in verschillende velden, inclusief gezondheidszorg en autonoom rijden.', 'verdict': '1'}
{'answer': 'Het antwoord op de gegeven vraag is niet aanwezig in de context', 'verdict': '-1'}
{'relevant_contexts': [1, 2]}
[[1, 2], {'relevant_contexts': [1, 2]}]
{'score': 6.0}
[{'statements': ['अल्बर्ट आइंस्टीन का जन्म जर्मनी में हुआ था।', 'अल्बर्ट आइंस्टीन अपने सापेक्षता के सिद्धांत के लिए सबसे अधिक प्रसिद्ध थे।']}, {'feedback': "De vraag is te vaag en breed, het vraagt om een 'ontdekking over de ruimte' zonder een specifiek aspect, tijdskader of context van interesse te specificeren. Dit kan verwijzen naar een breed scala aan onderwerpen, van de ontdekking van nieuwe hemellichamen tot vooruitgang in de technologie van ruimtereizen. Om de duidelijkheid en beantwoordbaarheid te verbeteren, zou de vraag het type ontdekking (bijv. astronomisch, technologisch), het tijdskader (bijv. recent, historisch) of de context (bijv. binnen een specifieke onderzoeksstudie of ruimtemissie) kunnen specificeren.", 'verdict': '0'}]

Error trace

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[4], [line 20](vscode-notebook-cell:?execution_count=4&line=20)
     [17](vscode-notebook-cell:?execution_count=4&line=17) language = "Dutch"
     [18](vscode-notebook-cell:?execution_count=4&line=18) cache_dir = ".cache"
---> [20](vscode-notebook-cell:?execution_count=4&line=20) generator.adapt(language, evolutions=[simple], cache_dir=cache_dir)
     [21](vscode-notebook-cell:?execution_count=4&line=21) generator.save(evolutions=[simple, reasoning, multi_context, conditional], cache_dir=cache_dir)

File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:311](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:311), in TestsetGenerator.adapt(self, language, evolutions, cache_dir)
    [309](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:309) self.init_evolution(evolution)
    [310](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:310) evolution.init()
--> [311](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:311) evolution.adapt(language, cache_dir=cache_dir)

File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:324](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:324), in SimpleEvolution.adapt(self, language, cache_dir)
    [323](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:323) def adapt(self, language: str, cache_dir: t.Optional[str] = None) -> None:
--> [324](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:324)     super().adapt(language, cache_dir)
    [325](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:325)     self.seed_question_prompt = self.seed_question_prompt.adapt(
    [326](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:326)         language, self.generator_llm, cache_dir
    [327](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:327)     )

File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:261](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:261), in Evolution.adapt(self, language, cache_dir)
    [255](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:255) self.rewrite_invalid_question_prompt = (
    [256](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:256)     self.rewrite_invalid_question_prompt.adapt(
    [257](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:257)         language, self.generator_llm, cache_dir
    [258](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:258)     )
    [259](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:259) )
    [260](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:260) self.node_filter.adapt(language, cache_dir)
--> [261](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:261) self.question_filter.adapt(language, cache_dir)

File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:97](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:97), in QuestionFilter.adapt(self, language, cache_dir)
     [93](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:93) def adapt(self, language: str, cache_dir: t.Optional[str] = None) -> None:
     [94](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:94)     """
     [95](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:95)     Adapt the filter to a different language.
     [96](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:96)     """
---> [97](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:97)     self.filter_question_prompt = self.filter_question_prompt.adapt(
     [98](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:98)         language, self.llm, cache_dir
     [99](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:99)     )

File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:236](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:236), in Prompt.adapt(self, language, llm, cache_dir)
    [230](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:230)             assert (
    [231](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:231)                 set(output.keys()) == output_keys[i]
    [232](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:232)             ), f"Adapted output keys {set(output.keys())=} do not match with the original output keys: {output_keys[i]=}"
    [233](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:233)         elif isinstance(output, list) and all(
    [234](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:234)             isinstance(item, dict) for item in output
    [235](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:235)         ):
--> [236](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:236)             assert all(
    [237](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:237)                 set(item.keys()) in output_keys[i] for item in output
    [238](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:238)             ), "Adapted output keys do not match with the original output keys"
    [240](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:240)     self.examples[i] = example_dict
    [242](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:242) self.language = language

Expected behavior No error!

Additional context Using langchain with azure openai endpoint.

Mar 18 '24 11:03 baswenneker

Hey @baswenneker thanks for reporting the issue. I would recommend to try it again with gpt-4.

Meanwhile I will work on a fix for it.

Mar 22 '24 17:03 shahules786

@shahules786 I'm using GPT-4 already. Tried like 20 times without any luck. A manual on how to rewrite the prompts by hand would be nice!

Mar 27 '24 18:03 baswenneker

Hey @baswenneker lemme try this out. We are currently changing some structures related to prompt - so I can test your case as well. thank you

Mar 28 '24 05:03 shahules786

Cool, let me know if I can help @shahules786!

Mar 28 '24 07:03 baswenneker

@shahules786 I added an extra set of examples to the translation prompts and it worked. I made a pull request for this:

https://github.com/explodinggradients/ragas/pull/826

Mar 29 '24 18:03 baswenneker

Had the same issue for french, so I made a pull request adding examples for french: #857

Apr 10 '24 12:04 adrienB134

I had the same issue, indeed for dutch. It occurs because ' {'relevant_contexts': [1, 2]}]' cannot be translated in the adapt function and therefore example[-1] in prompt.py is of a strange with text added to it. And then the json_loader._safe_load(example[-1], llm) returns an empty dict {}. Which does not correspond to the output_keys[i] whichis 'relevant_contexts'. I fixed it by replacing: _example_dict[self.output_key] = ( json_loader.safe_load(example[-1], llm) if self.output_type.lower() == "json" else example[-1] ) With if self.output_type.lower() == "json": example_dict[self.output_key] = json_loader._safe_load(example[-1], llm) if example_dict[self.output_key] == {}: # Extracting the dictionary part using string slicing dict_str = example[-1].split('(')[0].strip() example_dict[self.output_key ] = ast.literal_eval(dict_str) else: example_dict[self.output_key] = example[-1]

Which strips example[-1] and turns it into a string, which can be used. I know its not the neatest solutions, I will try to improve that. Hope it helps

Apr 22 '24 15:04 louky123

Same issue here! Fellow Dutchie ;) I did the following:

convert metric prompts to Dutch with 4o ragas_metrics_nl.txt

and voila!

May 15 '24 14:05 mattevsz

tagging #890 as a meta issue for fixing all of these bugs

Aug 02 '24 07:08 jjmachan

This has been fixed with v0.2 - I know finally 😅 🎉

do checkout the docs here: https://docs.ragas.io/en/stable/howtos/customizations/metrics/_metrics_language_adaptation/ reference here: https://docs.ragas.io/en/stable/references/prompt/#ragas.prompt.PromptMixin

and if you're migrating from v0.1 check out the migration docs here: https://docs.ragas.io/en/stable/howtos/migrations/migrate_from_v01_to_v02

could you check it out and verify if not feel free to comment here and I'll help you out - really sorry again that it tool this while

Oct 18 '24 06:10 jjmachan