ScanRefer
ScanRefer copied to clipboard
How the evaluation works?
I notice that the deinition of ref_acc in (line 89, lib/eval_helper.py) calculates whether the selected bounding box matches the prediction box with maximum iou with the target box.
However, in my understanding, the expected output of 3D visual grounding is to generate only one bounding box with repect to the input scene and language query. Thus, this metric is only an intermediate evaluation rather than the final evaluation?