ragas How to interpret the combination of metrics: context precision and the rest (real world example)

I ran ragas to evaluate my LangChain-powered chatbot (it's basically a QA chain with document retrieval) and I got the following results.

question	ground_truth	faithfulness	answer_relevancy	context_recall	context_relevancy
Q1	GT1	1	0.813637991	1	0.002824859
Q2	GT2	1	0.835290922	0	0.002890173
Q3	GT3	1	0.882307479	1	0.002659574
Q4	GT4	1	0.844765424	0	0.01953125
Q5	GT5	1	0.889618083	1	0.017857143

Of course, the context_precision (another form of context_relevancy which will disappear I think, according to the docs) values are very low (aka horrible). So, I did some debugging to understand the intermediate calculations (I didn't grasp everything.. but I've got an idea), and I'm wondering how is this situation possible (this is how I interpret it, and correct if I'm wrong):

context_recall: 1.00 (can it retrieve all the relevant information required to answer the question: YES) context_precision: 0.00 (the signal to noise ration of retrieved context: -almost- everything retrieved is Noise)

For example, I checked that for one answer, this is how the context precision metric evaluated the 2 retrieved documents:

[[ChatGeneration(text='No.', generation_info={'finish_reason': 'stop'}, message=AIMessage(content='No.'))]

Yet, the faithfullness is 1 and the answer relevancy is 0.81.. I'm really confused.. maybe I miss something, but I'd like to understand how to interpret not only each metric independently, but the combinations of their values and what they entail.

Thank you,

Nov 20 '23 11:11 younes-io

I'm also wondering if this is a "side effect" of the (relatively) long chunks of my docs ? (around 500 tokens).. I don't know if this also impacts the calculation..

Nov 20 '23 11:11 younes-io

@shahules786 : could you please take a look on this please?

Nov 21 '23 13:11 younes-io

Hi @younes-io , this is an interesting but weird result. Will you be able to share a subset of your data so that I can understand well what's going on?

Nov 21 '23 13:11 shahules786

@shahules786 I'm afraid I can't share that since it's private data.. Basically, I have document chunks (say 2) returned by OpenSearch, which contain the answer to the question. The first document contains the response, the second contains a small portion of the answer. The second document is larger than the first. I'm just wondering if ragas takes into account the ratio of "relevance to the question / length of the context" in its calculations of context_precision..

Nov 21 '23 13:11 younes-io

@shahules786 : I have tested using the example in ragas docs

So, I used this dataset:

from datasets import load_dataset

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_eval

and here's the result:

	question	contexts	answer	ground_truths	faithfulness	answer_relevancy	context_recall	context_relevancy
0	How to deposit a cheque issued to an associate...	[Just have the associate sign the back and the...	\nThe best way to deposit a cheque issued to a...	[Have the check reissued to the proper payee.J...	1.0	0.938239	0.875	0.058824
1	Can I send a money order from USPS as a business?	[Sure you can. You can fill in whatever you w...	\nYes, you can send a money order from USPS as...	[Sure you can. You can fill in whatever you w...	0.8	0.885277	1.000	0.285714
2	1 EIN doing business under multiple business n...	[You're confusing a lot of things here. Compan...	\nYes, it is possible to have one EIN doing bu...	[You're confusing a lot of things here. Compan...	0.8	0.924754	0.000	0.083333
3	Applying for and receiving business credit	[Set up a meeting with the bank that handles y...	\nApplying for and receiving business credit c...	["I'm afraid the great myth of limited liabili...	1.0	0.899104	0.500	0.333333
4	401k Transfer After Business Closure	[The time horizon for your 401K/IRA is essenti...	\nIf your employer has closed and you need to ...	[You should probably consult an attorney. Howe...	0.6	0.853572	0.000	0.043478

The context_precision is "almost" always equal to zero (or holds a near-zero value).

N.B: in the docs, the context precision is not displayed.

Nov 22 '23 13:11 younes-io

@shahules786 : sorry for bothering you, is someone from the team / community able to help on this please ? Thank you

Nov 24 '23 13:11 younes-io

Hi @younes-io , apologies for the late reply. Can you share your ragas version and LLM used? Also can you try out the same using latest ragas in main ? You can install from source using pip install git+https://github.com/explodinggradients/ragas

Nov 24 '23 13:11 shahules786

@younes-io If you're open for a short call, I would love to help in person. Please book a slot here (early next week)

Nov 24 '23 13:11 shahules786

@shahules786 no worries, I'm also very sorry for the very late reply.. Sure, I'll book a slot!

Nov 29 '23 09:11 younes-io