TestsetGenerator issue with small Q&A FAQ dataset - multiple 'no neighbors exists' errors and limited generation output
Hi,
I've been using the TestsetGenerator with default and other adjusted question type distributions, but in nearly all cases using a a test_size of 20-50, I get very few results returned. I see multiple errors such as:
/home/vscode/.local/lib/python3.11/site-packages/ragas/testset/testset_generator.py:269: UserWarning: No neighbors exists warnings.warn("No neighbors exists") and /home/vscode/.local/lib/python3.11/site-packages/ragas/testset/utils.py:16: UserWarning: Invalid json warnings.warn("Invalid json")
I've looked through the documentation but there's nothing that covers these issues. The documents are short/simple FAQ question/answers, so relatively small in size on a per document basis. An example is:
Do I get scores for all the activities that I do? No, some activities are designed to help show language applied to real-life situations, like games or videos.
This would be a sample document as it represents a page in an FAQ - this may not be relevant, but I've added it in case the context is useful. There's 67 of these FAQ documents in the data I'm working with and I'm running:
test_generator.generate(documents, test_size=30)
Typically I'm getting between 4-7 results back, which is not many - but I assume the errors are having an impact in some way that I can't determine.
apologies for the hard time but this will be fixed with #380 I'll update this as soon as its merged in 🙂
@jjmachan about when do you think this will be fixed/merged?
hopefully next week 🤞🏽
@jjmachan Any updates on this? :)
Anyone has work around method to use it with v0.0.22 ? It's not return enough as test size.
hey @thenextmz, @lamkhatinh and @Norfolkmag we have fixed this in main - you can use it now - do take it for a spin and let us know (you will have to install from source for now but we are going to release v0.1 in a couple of hours, going through the issues to see if we missed any major ones and final touch ups 🙂 )
you can see the updated docs here https://docs.ragas.io/en/latest/concepts/testset_generation.html