Cannot generate TestDataset more than once
[x] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug Calling TestsetGenerator.generate_with_langchain_docs() more than once results in documents from the first call being used in subsequent calls.
Ragas version: 0.1.5 Python version: 3.11
Code to Reproduce
from langchain.document_loaders.directory import DirectoryLoader
loader = DirectoryLoader("your-directory") # contains some .txt file
documents = loader.load()
from ragas.testset.generator import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
# generator with openai models
generator_llm = ChatOpenAI(model="gpt-3.5-turbo")
critic_llm = ChatOpenAI(model="gpt-4")
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
generator = TestsetGenerator.from_langchain(
generator_llm,
critic_llm,
embeddings
)
# generate testset
testset = generator.generate_with_langchain_docs(documents, test_size=1, distributions={simple: 1})
loader2 = DirectoryLoader("your-directory-2") # contains some other .txt file
documents2 = loader2.load()
generator_llm2 = ChatOpenAI(model="gpt-3.5-turbo")
critic_llm2 = ChatOpenAI(model="gpt-4")
embeddings2 = OpenAIEmbeddings(model="text-embedding-3-small")
generator2 = TestsetGenerator.from_langchain(
generator_llm2,
critic_llm2,
embeddings2
)
testset2 = generator2.generate_with_langchain_docs(documents2, test_size=1, distributions={simple: 1})
Expected behavior I expect testset2 to contain a question about the .txt file in your-directory-2, however it contains a question about the .txt file in your-directory.
Additional context None.
I am experiencing the same issue. I was only able to generate 3 test questions.
AFAIK, the test_size parameter refers to the number of candidate questions, so it acts as a maximum but not a guaranteed amount. My question refers to calling generate() or generate_with_langchain_docs() multiple times using different documents. If you do this, the questions will be about the first documents passed for all calls. This is a memory management issue most likely caused by the InMemoryDocumentStore class from what I can tell.
plus 1 on what @njcolvin said.
The "evolutionary process" (details here) ragas uses for testset generation reduces the number of actual contexts you can use to generate question. the filter might filter out most of the documents because you can't generation high quality questions from it.
it also depends on the LLMs you use too
how many documents are there in documents2 @njcolvin
I can also spend some time with you on a call to debug further if you like 🙂
Hi @jjmachan I appreciate the response. In the above example I believe I used Biden's state of the union for documents and the first 1000 lines of tinyshakespeare for documents2. I am unsure exactly how many documents were created from these sources.
You might be right about the filter discarding documents2 as it seemed to do this and then bring back the old documents. I was able to make it work for my application by calling the generation code with different documents each time using the subprocess module.
Thank you for offering to debug this with me. I am interested, and will follow up as soon as I can.
Im having the same issue as @njcolvin. Any update @jjmachan?
I have the same issue. I will appreciate it if you udate.