ragas icon indicating copy to clipboard operation
ragas copied to clipboard

Test dataset Issue

Open nandhu-nr opened this issue 1 year ago • 1 comments

WARNING:ragas.testset.docstore:Filename and doc_id are the same for all nodes.

ValueError Traceback (most recent call last) in <cell line: 11>() 9 10 # generate testset ---> 11 testset1 = generator.generate_with_langchain_docs(documents, test_size=10, distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25}, raise_exceptions=False)

3 frames /usr/local/lib/python3.10/dist-packages/ragas/testset/docstore.py in get_random_nodes(self, k, alpha) 325 prob = prob / np.sum(prob) 326 --> 327 nodes = self.run_config.rng.choice( 328 np.array(self.nodes), size=k, p=prob 329 ).tolist()

numpy/random/_generator.pyx in numpy.random._generator.Generator.choice()

ValueError: a cannot be empty unless no samples are taken

nandhu-nr avatar Oct 04 '24 13:10 nandhu-nr

The error you're encountering, ValueError: a cannot be empty unless no samples are taken, is a known issue that occurs when the sample function is called with an empty sequence. This can happen if the test_size is greater than the number of available nodes in the docstore.

To resolve this issue, ensure that the docstore contains enough nodes to match the test_size. You can add a check to handle cases where there are not enough nodes:

current_nodes = [
    CurrentNodes(root_node=n, nodes=[n])
    for n in self.docstore.get_random_nodes(k=test_size)
]

if len(current_nodes) < test_size:
    raise ValueError("Not enough nodes in the docstore to generate the test set.")

Additionally, make sure that the documents list is not empty before passing it to the TestsetGenerator. You can add a check to verify that documents are loaded successfully:

documents = SimpleDirectoryReader(CONFIG.file_path).load_data()

# Check if documents are loaded
if not documents:
    raise ValueError("No documents were loaded. Please check your file path or parameters.")

print("load file num:", len(documents))

This will ensure that you do not attempt to generate a test set with an empty list of documents [1].

To continue talking to Dosu, mention @dosu.

dosubot[bot] avatar Oct 04 '24 13:10 dosubot[bot]