ragas icon indicating copy to clipboard operation
ragas copied to clipboard

Documents appears to be too short (ie 100 tokens or less). Please provide longer documents.

Open ananthanarayanan431 opened this issue 6 months ago • 1 comments

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug A clear and concise description of what the bug is. ValueError Traceback (most recent call last) Cell In[38], line 19 12 generator = TestsetGenerator.from_langchain( 13 llm=generator_llm, 14 embedding_model=generator_embeddings, 15 ) 17 query_distribution = default_query_distribution(generator_llm) ---> 19 testset = generator.generate_with_langchain_docs( 20 documents=chunks, 21 testset_size=10, 22 query_distribution=query_distribution, 23 )

File ~/Desktop/project/.venv/lib/python3.12/site-packages/ragas/testset/synthesizers/generate.py:164, in TestsetGenerator.generate_with_langchain_docs(self, documents, testset_size, transforms, transforms_llm, transforms_embedding_model, query_distribution, run_config, callbacks, with_debugging_logs, raise_exceptions) 159 raise ValueError( 160 """An embedding client was not provided. Provide an embedding through the transforms_embedding_model parameter. Alternatively you can provide your own transforms through the transforms parameter.""" 161 ) 163 if not transforms: --> 164 transforms = default_transforms( 165 documents=list(documents), 166 llm=transforms_llm or self.llm, 167 embedding_model=transforms_embedding_model or self.embedding_model, 168 ) 170 # convert the documents to Ragas nodes ... 161 "Documents appears to be too short (ie 100 tokens or less). Please provide longer documents." 162 ) 164 return transforms

ValueError: Documents appears to be too short (ie 100 tokens or less). Please provide longer documents.

Ragas version: 0.2.15 Python version: 3.12

Code to Reproduce

from ragas.llms.base import LangchainLLMWrapper from ragas.embeddings.base import LangchainEmbeddingsWrapper from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from ragas.testset import TestsetGenerator from ragas.testset.synthesizers import default_query_distribution

generator_llm = LangchainLLMWrapper(langchain_llm=ChatOpenAI(model="gpt-4o-mini")) generator_embeddings = LangchainEmbeddingsWrapper(embeddings=OpenAIEmbeddings(model="text-embedding-3-small"))

generator = TestsetGenerator.from_langchain( llm=generator_llm, embedding_model=generator_embeddings, )

query_distribution = default_query_distribution(generator_llm)

testset = generator.generate_with_langchain_docs( documents=chunks, testset_size=10, query_distribution=query_distribution, )

Error trace ValueError

Expected behavior Creation of test dataset

Additional context Add any other context about the problem here.

ananthanarayanan431 avatar Jun 18 '25 18:06 ananthanarayanan431