opencompass
opencompass copied to clipboard
[Feature] Add a new evaluation dataset
Describe the feature
We want to contribute our benchmark to OpenCompass. Here is the repo:https://github.com/IAAR-Shanghai/UHGEval. However, there is an issue: in one of the tasks within this benchmark, multiple model calls are required to obtain the final evaluation result for a single evaluation data point. We have observed that most benchmarks typically involve only one model call per data point. Please inform us if OpenCompass supports multiple model calls within a single evaluation.
Will you implement it?
- [X] I would like to implement this feature and create a PR!
models is a list of dict. You can evalute multiple models with one config
models is a list of dict. You can evalute multiple models with one config
My expression may not be clear enough. What I mean is that when evaluating a data point, I need to construct multiple prompts and call the model multiple times. For example, in discriminative evaluation, a data point may contain two sentences, one of which contains hallucinated content, and the other does not. I need to call the model twice. The first call is for evaluating whether the model can correctly identify the presence of hallucinated content in the first sentence, i.e., giving the judgment: that the first sentence contains hallucinated content. The second call is for evaluating whether the model can correctly identify the absence of hallucinated content in the second sentence, i.e., giving the judgment: that the second sentence does not contain hallucinated content. The evaluation is considered successful only when both judgments are correct.
The code is as follows:
answer_hallu, reason_hallu = self.model.is_continuation_hallucinated(hallu, data_point, with_reason=True)
answer_unhallu, reason_unhallu = self.model.is_continuation_hallucinated(unhallu, data_point, with_reason=True)
Each execution of the is_continuation_hallucinated method will call the model once.
请问咱们当前有专一评测大模型翻译能力的bench吗?如何使用?谢谢
@White-Friday Please check Flores. Feel free to re-open if needed.