ragas icon indicating copy to clipboard operation
ragas copied to clipboard

Duplicate Questions and Spelling Mistakes in Generated Testset

Open Jayashree-kalabhavi opened this issue 1 year ago • 3 comments

[ ] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question When using the TestsetGenerator from the ragas.testset module, I am encountering the following issues:

Duplicate questions: The generated test set often contains repeated questions. Spelling mistakes: The generated questions contain spelling errors (e.g., "Presidnet" instead of "President", "spesial meting" instead of "special meeting").

Code Examples

Initialize the LLM wrapper

generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4"))

Initialize the Embeddings wrapper

generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings())

from ragas.testset import TestsetGenerator

generator = TestsetGenerator(llm=generator_llm, embedding_model=generator_embeddings) dataset = generator.generate_with_langchain_docs(docs, testset_size=5)

Additional context

Sample Output: What does the term 'The Site of the University of Missouri' refer to according to the Board Bylaws? What is the role of the Presidnet in the University of Missouri as per the Board of Curators? Who can call a spesial meting of the Board? What are the responsibilities of the Board of Curators as per the Board Bylaws? What are the responsibilities of the Board of Curators at the University of Missouri? What are the responsibilities of the Board of Curators at the University of Missouri?

Jayashree-kalabhavi avatar Nov 15 '24 17:11 Jayashree-kalabhavi