ragas
ragas copied to clipboard
Generate a Synthetic Test Set In Other Language
I checked the documentation and related resources and couldn't find an answer to my question.
Your Question what is unclear to you? What would you like to know? when I use the function generate_with_langchain_docs to generate a test, the columns of question ,ground_truth is generated in ENGLISH, how could I set them in other language?
Code Examples This community speaks code. Share your code snippets to help us understand your question better.
Additional context Anything else you want to share with us?
Hey @AprilJoy! 👋 I'm here to assist you with any bugs, questions, or contributions you have. I'm Dosu, a bot designed to help out while you wait for a human maintainer.
To generate the "question" and "ground_truth" columns in a different language using the generate_with_langchain_docs function, you need to adapt the language settings of the generator_llm and critic_llm used in the TestsetGenerator. Here's how you can modify the function to include a language parameter:
def generate_with_langchain_docs(
self,
documents: t.Sequence[LCDocument],
test_size: int,
distributions: t.Optional[Distributions] = None,
with_debugging_logs=False,
is_async: bool = True,
raise_exceptions: bool = True,
run_config: t.Optional[RunConfig] = None,
language: str = "en" # Add a language parameter with a default value
):
distributions = distributions or {}
# chunk documents and add to docstore
self.docstore.add_documents(
[Document.from_langchain_document(doc) for doc in documents]
)
# Adapt the language for the generator and critic LLMs
self.generator_llm.adapt(language)
self.critic_llm.adapt(language)
return self.generate(
test_size=test_size,
distributions=distributions,
with_debugging_logs=with_debugging_logs,
is_async=is_async,
raise_exceptions=raise_exceptions,
run_config=run_config,
)
In this modification, the language parameter is added to the generate_with_langchain_docs function. The adapt method is called on both generator_llm and critic_llm to set the desired language before generating the test set. This ensures that the "question" and "ground_truth" columns are generated in the specified language [1].
To continue talking to Dosu, mention @dosu.
Hey @AprilJoy We understand this is important, but the current version of test generation itself has many other limitations which is why we are working on #1016 . An initial version of this will be merged by early next week and then we will collect and work on issues like this which need attention.
keeping a tab on #1016
I suppose there is not really an update on this one? I followed the tutorial https://docs.ragas.io/en/stable/howtos/customizations/testgenerator/_language_adaptation/ with the only change that I'm using German data instead of Spanish. However, most of the questions and answers generated are in English