ragas icon indicating copy to clipboard operation
ragas copied to clipboard

faithfulness may be not faithful,because of the model's hallucinations.

Open PurpleLikeNero opened this issue 2 years ago • 4 comments

Faithfulness, as an evaluation metric, may become less accurate when using the model's generation function, as it can be influenced by the model's hallucinations. In the process of conducting extensive data evaluations, I encountered some anomalies in the assessment of AnswerCorrectness. Specifically, in cases where both the ground_truths and the answer had an AnswerSimilarity of 1, the AnswerCorrectness score was 0.6. However, when evaluated independently, the AnswerCorrectness score returned to 1. I suspect that the discrepancies in AnswerCorrectness are due to potential inaccuracies in the Faithfulness it relies on, and these inaccuracies may be attributed to the model's hallucinations.

PurpleLikeNero avatar Nov 06 '23 01:11 PurpleLikeNero

Hi @PurpleLikeNero , thanks for rising the issue. Yes, this can happen and since all ragas metrics are based on LLMs, the quality if output is dependent on LLMs. Could you please share some more details like the model you're using and sample data if possible?

shahules786 avatar Nov 06 '23 10:11 shahules786

sure,the LLMs i used named Qwen-7B,which is published by Alibaba Cloud.

PurpleLikeNero avatar Nov 08 '23 06:11 PurpleLikeNero

I'll let @shahules786 answer this but because of the model size/capabilities that would be expected

jjmachan avatar Nov 29 '23 14:11 jjmachan

Hi, is there a workaround for this? Or would you be knowing any other methods (even other than ragas) which could handle hallucinations?

I'm looking at options to identify hallucinations for my RAG system. I'm a bit sceptical to the idea of an LLM finding issues with an already LLM generated output. What would be the other options to identify hallucinations?

I looked at some tools like Vectara HHEM, but the context size is very less (512 tokens). It would be really appreciated if someone could point me to a good direction.

Thanks!

aldrinjenson avatar Mar 07 '24 04:03 aldrinjenson