Support positively grading 'I don't know' answers when the context is not relevant
Describe the Feature Hi there, I read all metric descriptions and realized that there is no support for the following case:
Suppose the LLM is asked a question, and there is no way for the answer to be retrieved by the RAG because it is not documented, but the ground truth does exist. In this case I would want the LLM to say "I don't know", and to ignore the non-relevant context which is retrieved. There is no metric that would evaluate this case positively (correct me if I'm wrong).
Why is the feature important for you? I would like to use the package for analyzing an LLM I have, but it does not cover the case above.
Additional context I'll give you an example:
Suppose the use case of the LLM is to answer questions about documentation of a product. The user asks "Can this product do {something that it cannot do}? The context retrieved will never include the ground truth, which is 'No, it is not supported'.
Hey @mayalinetsky-kryon currently most of these cases return NaN. Because we think that there is no point in scoring when the system is refusing to answer. Check out the same with faithfulness, etc.
Do you have anything other in mind?
I was thinking of some conditional metric: If the context is not relevant, or the answer cannot be found in context, AND the LLM's response is 'IDK''couldn't find an answer', then a score would be high, because that's the best case scenario.