ragas icon indicating copy to clipboard operation
ragas copied to clipboard

TestsetGenerator issue with small Q&A FAQ dataset - multiple 'no neighbors exists' errors and limited generation output

Open Norfolkmag opened this issue 1 year ago • 6 comments

Hi,

I've been using the TestsetGenerator with default and other adjusted question type distributions, but in nearly all cases using a a test_size of 20-50, I get very few results returned. I see multiple errors such as:

/home/vscode/.local/lib/python3.11/site-packages/ragas/testset/testset_generator.py:269: UserWarning: No neighbors exists warnings.warn("No neighbors exists") and /home/vscode/.local/lib/python3.11/site-packages/ragas/testset/utils.py:16: UserWarning: Invalid json warnings.warn("Invalid json")

I've looked through the documentation but there's nothing that covers these issues. The documents are short/simple FAQ question/answers, so relatively small in size on a per document basis. An example is:

Do I get scores for all the activities that I do? No, some activities are designed to help show language applied to real-life situations, like games or videos.

This would be a sample document as it represents a page in an FAQ - this may not be relevant, but I've added it in case the context is useful. There's 67 of these FAQ documents in the data I'm working with and I'm running:

test_generator.generate(documents, test_size=30)

Typically I'm getting between 4-7 results back, which is not many - but I assume the errors are having an impact in some way that I can't determine.

Norfolkmag avatar Jan 04 '24 10:01 Norfolkmag

apologies for the hard time but this will be fixed with #380 I'll update this as soon as its merged in 🙂

jjmachan avatar Jan 08 '24 11:01 jjmachan

@jjmachan about when do you think this will be fixed/merged?

thenextmz avatar Jan 11 '24 21:01 thenextmz

hopefully next week 🤞🏽

jjmachan avatar Jan 12 '24 00:01 jjmachan

@jjmachan Any updates on this? :)

thenextmz avatar Jan 23 '24 14:01 thenextmz

Anyone has work around method to use it with v0.0.22 ? It's not return enough as test size.

lamkhatinh avatar Feb 05 '24 05:02 lamkhatinh

hey @thenextmz, @lamkhatinh and @Norfolkmag we have fixed this in main - you can use it now - do take it for a spin and let us know (you will have to install from source for now but we are going to release v0.1 in a couple of hours, going through the issues to see if we missed any major ones and final touch ups 🙂 )

you can see the updated docs here https://docs.ragas.io/en/latest/concepts/testset_generation.html

jjmachan avatar Feb 06 '24 01:02 jjmachan