waymo-open-dataset
waymo-open-dataset copied to clipboard
Understanding evaluation results for L1 difficulty
I'm trying to understand how evaluation are done for L1 difficulty. Essentially L1 difficulty selects a subset of GT boxes. In this case, I'm not sure how are precision evaluated. Specifically, I wonder if the predicted boxes are filtered/selected accordingly during evaluation. Intuitively, since some predicted boxes are predictions for L2 level GT boxes, does it make sense to ignore some of them when evaluating L1 difficulty metrics?
I tried searching via google but no one has mentioned this detail. Any links/answers related to this question is appreciated. Thanks!
Hi,
When calculating L1 metrics, we do not treat any predictions that predict L2 ground truths as false positives. Please refer to https://github.com/waymo-research/waymo-open-dataset/blob/master/waymo_open_dataset/metrics/detection_metrics.cc#L93-L95 for the implementation details.
Best, Wayne, on behalf of the Waymo Open Dataset team