Documents appears to be too short (ie 100 tokens or less). Please provide longer documents.
[ ] I have checked the documentation and related resources and couldn't resolve my bug.
Describe the bug A clear and concise description of what the bug is. ValueError Traceback (most recent call last) Cell In[38], line 19 12 generator = TestsetGenerator.from_langchain( 13 llm=generator_llm, 14 embedding_model=generator_embeddings, 15 ) 17 query_distribution = default_query_distribution(generator_llm) ---> 19 testset = generator.generate_with_langchain_docs( 20 documents=chunks, 21 testset_size=10, 22 query_distribution=query_distribution, 23 )
File ~/Desktop/project/.venv/lib/python3.12/site-packages/ragas/testset/synthesizers/generate.py:164, in TestsetGenerator.generate_with_langchain_docs(self, documents, testset_size, transforms, transforms_llm, transforms_embedding_model, query_distribution, run_config, callbacks, with_debugging_logs, raise_exceptions)
159 raise ValueError(
160 """An embedding client was not provided. Provide an embedding through the transforms_embedding_model parameter. Alternatively you can provide your own transforms through the transforms parameter."""
161 )
163 if not transforms:
--> 164 transforms = default_transforms(
165 documents=list(documents),
166 llm=transforms_llm or self.llm,
167 embedding_model=transforms_embedding_model or self.embedding_model,
168 )
170 # convert the documents to Ragas nodes
...
161 "Documents appears to be too short (ie 100 tokens or less). Please provide longer documents."
162 )
164 return transforms
ValueError: Documents appears to be too short (ie 100 tokens or less). Please provide longer documents.
Ragas version: 0.2.15 Python version: 3.12
Code to Reproduce
from ragas.llms.base import LangchainLLMWrapper from ragas.embeddings.base import LangchainEmbeddingsWrapper from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from ragas.testset import TestsetGenerator from ragas.testset.synthesizers import default_query_distribution
generator_llm = LangchainLLMWrapper(langchain_llm=ChatOpenAI(model="gpt-4o-mini")) generator_embeddings = LangchainEmbeddingsWrapper(embeddings=OpenAIEmbeddings(model="text-embedding-3-small"))
generator = TestsetGenerator.from_langchain( llm=generator_llm, embedding_model=generator_embeddings, )
query_distribution = default_query_distribution(generator_llm)
testset = generator.generate_with_langchain_docs( documents=chunks, testset_size=10, query_distribution=query_distribution, )
Error trace ValueError
Expected behavior Creation of test dataset
Additional context Add any other context about the problem here.