UI-TARS
UI-TARS copied to clipboard
Mind2web 评测实现细节
描述: Element Accuracy (Ele. Acc) compares the predicted element with the ground-truth elements. Operation F1 (Op. F1) calculates the token-level F1 score for the predicted operation comprised of action and input value. Step Success Rate (Step SR) measures the success of each action step. A step is successful only if the selected element and the predicted operation are correct.
以上是关于Mind2web指标的描述,它并没有开源评测代码,请问您能对这几个指标特别是Ele. Acc, Op. F1的具体计算做示例介绍吗