Automatic language adaption TestSet generation error: "Adapted output keys do not match with the original output keys"
Describe the bug Can't get the automatic language adaption going for testset generation. I retried this about 10 times.
Ragas version: 0.1.4 Python version: 3.11
Code to Reproduce Share code to reproduce the issue
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context,conditional
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# generator with openai models
generator_llm = azure_llm()
critic_llm = azure_llm()
embeddings = azure_embeddings()
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# adapt to language
language = "Dutch"
cache_dir = ".cache"
generator.adapt(language, evolutions=[simple], cache_dir=cache_dir)
generator.save(evolutions=[simple, reasoning, multi_context, conditional], cache_dir=cache_dir)
This is the output until it errors out:
{'keyphrases': ['Zwart gat', 'Regio van ruimtetijd', 'Sterke zwaartekracht', 'Licht en elektromagnetische golven', 'Theorie van algemene relativiteit']}
{'keyphrases': ['Chinese Muur', 'Oude vestingwerken', 'Noord-China']}
{'answer': 'Menselijke activiteiten dragen voornamelijk bij aan klimaatverandering door de uitstoot van broeikasgassen bij het verbranden van fossiele brandstoffen. Deze uitstoot verhoogt de concentratie van broeikasgassen in de atmosfeer, wat meer warmte vasthoudt en leidt tot opwarming van de aarde en veranderde weerspatronen.', 'verdict': '1'}
{'answer': 'Kunstmatige intelligentie is ontworpen om menselijke cognitieve functies na te bootsen, met belangrijke capaciteiten zoals leren, redeneren, waarnemen en reageren op de omgeving op een manier die vergelijkbaar is met mensen. Deze capaciteiten maken AI cruciaal in verschillende velden, inclusief gezondheidszorg en autonoom rijden.', 'verdict': '1'}
{'answer': 'Het antwoord op de gegeven vraag is niet aanwezig in de context', 'verdict': '-1'}
{'relevant_contexts': [1, 2]}
[[1, 2], {'relevant_contexts': [1, 2]}]
{'score': 6.0}
[{'statements': ['अल्बर्ट आइंस्टीन का जन्म जर्मनी में हुआ था।', 'अल्बर्ट आइंस्टीन अपने सापेक्षता के सिद्धांत के लिए सबसे अधिक प्रसिद्ध थे।']}, {'feedback': "De vraag is te vaag en breed, het vraagt om een 'ontdekking over de ruimte' zonder een specifiek aspect, tijdskader of context van interesse te specificeren. Dit kan verwijzen naar een breed scala aan onderwerpen, van de ontdekking van nieuwe hemellichamen tot vooruitgang in de technologie van ruimtereizen. Om de duidelijkheid en beantwoordbaarheid te verbeteren, zou de vraag het type ontdekking (bijv. astronomisch, technologisch), het tijdskader (bijv. recent, historisch) of de context (bijv. binnen een specifieke onderzoeksstudie of ruimtemissie) kunnen specificeren.", 'verdict': '0'}]
Error trace
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[4], [line 20](vscode-notebook-cell:?execution_count=4&line=20)
[17](vscode-notebook-cell:?execution_count=4&line=17) language = "Dutch"
[18](vscode-notebook-cell:?execution_count=4&line=18) cache_dir = ".cache"
---> [20](vscode-notebook-cell:?execution_count=4&line=20) generator.adapt(language, evolutions=[simple], cache_dir=cache_dir)
[21](vscode-notebook-cell:?execution_count=4&line=21) generator.save(evolutions=[simple, reasoning, multi_context, conditional], cache_dir=cache_dir)
File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:311](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:311), in TestsetGenerator.adapt(self, language, evolutions, cache_dir)
[309](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:309) self.init_evolution(evolution)
[310](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:310) evolution.init()
--> [311](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/generator.py:311) evolution.adapt(language, cache_dir=cache_dir)
File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:324](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:324), in SimpleEvolution.adapt(self, language, cache_dir)
[323](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:323) def adapt(self, language: str, cache_dir: t.Optional[str] = None) -> None:
--> [324](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:324) super().adapt(language, cache_dir)
[325](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:325) self.seed_question_prompt = self.seed_question_prompt.adapt(
[326](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:326) language, self.generator_llm, cache_dir
[327](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:327) )
File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:261](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:261), in Evolution.adapt(self, language, cache_dir)
[255](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:255) self.rewrite_invalid_question_prompt = (
[256](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:256) self.rewrite_invalid_question_prompt.adapt(
[257](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:257) language, self.generator_llm, cache_dir
[258](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:258) )
[259](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:259) )
[260](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:260) self.node_filter.adapt(language, cache_dir)
--> [261](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/evolutions.py:261) self.question_filter.adapt(language, cache_dir)
File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:97](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:97), in QuestionFilter.adapt(self, language, cache_dir)
[93](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:93) def adapt(self, language: str, cache_dir: t.Optional[str] = None) -> None:
[94](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:94) """
[95](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:95) Adapt the filter to a different language.
[96](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:96) """
---> [97](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:97) self.filter_question_prompt = self.filter_question_prompt.adapt(
[98](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:98) language, self.llm, cache_dir
[99](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/testset/filters.py:99) )
File [~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:236](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:236), in Prompt.adapt(self, language, llm, cache_dir)
[230](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:230) assert (
[231](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:231) set(output.keys()) == output_keys[i]
[232](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:232) ), f"Adapted output keys {set(output.keys())=} do not match with the original output keys: {output_keys[i]=}"
[233](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:233) elif isinstance(output, list) and all(
[234](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:234) isinstance(item, dict) for item in output
[235](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:235) ):
--> [236](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:236) assert all(
[237](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:237) set(item.keys()) in output_keys[i] for item in output
[238](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:238) ), "Adapted output keys do not match with the original output keys"
[240](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:240) self.examples[i] = example_dict
[242](https://file+.vscode-resource.vscode-cdn.net/Users/bas/Development/HeadingFWD/evaluation-playground/~/Development/HeadingFWD/evaluation-playground/.venv/lib/python3.11/site-packages/ragas/llms/prompt.py:242) self.language = language
Expected behavior No error!
Additional context Using langchain with azure openai endpoint.
Hey @baswenneker thanks for reporting the issue. I would recommend to try it again with gpt-4.
Meanwhile I will work on a fix for it.
@shahules786 I'm using GPT-4 already. Tried like 20 times without any luck. A manual on how to rewrite the prompts by hand would be nice!
Hey @baswenneker lemme try this out. We are currently changing some structures related to prompt - so I can test your case as well. thank you
Cool, let me know if I can help @shahules786!
@shahules786 I added an extra set of examples to the translation prompts and it worked. I made a pull request for this:
https://github.com/explodinggradients/ragas/pull/826
Had the same issue for french, so I made a pull request adding examples for french: #857
I had the same issue, indeed for dutch. It occurs because ' {'relevant_contexts': [1, 2]}]' cannot be translated in the adapt function and therefore example[-1] in prompt.py is of a strange with text added to it. And then the json_loader._safe_load(example[-1], llm) returns an empty dict {}. Which does not correspond to the output_keys[i] whichis 'relevant_contexts'. I fixed it by replacing: _example_dict[self.output_key] = ( json_loader.safe_load(example[-1], llm) if self.output_type.lower() == "json" else example[-1] ) With if self.output_type.lower() == "json": example_dict[self.output_key] = json_loader._safe_load(example[-1], llm) if example_dict[self.output_key] == {}: # Extracting the dictionary part using string slicing dict_str = example[-1].split('(')[0].strip() example_dict[self.output_key ] = ast.literal_eval(dict_str) else: example_dict[self.output_key] = example[-1]
Which strips example[-1] and turns it into a string, which can be used. I know its not the neatest solutions, I will try to improve that. Hope it helps
Same issue here! Fellow Dutchie ;) I did the following:
convert metric prompts to Dutch with 4o ragas_metrics_nl.txt
and voila!
tagging #890 as a meta issue for fixing all of these bugs
This has been fixed with v0.2 - I know finally 😅 🎉
do checkout the docs here: https://docs.ragas.io/en/stable/howtos/customizations/metrics/_metrics_language_adaptation/ reference here: https://docs.ragas.io/en/stable/references/prompt/#ragas.prompt.PromptMixin
and if you're migrating from v0.1 check out the migration docs here: https://docs.ragas.io/en/stable/howtos/migrations/migrate_from_v01_to_v02
could you check it out and verify if not feel free to comment here and I'll help you out - really sorry again that it tool this while