dataset-api icon indicating copy to clipboard operation
dataset-api copied to clipboard

car_instance better evaluating metrics

Open stevenwudi opened this issue 5 years ago • 1 comments

Since the challenge has ended, it turns out to be a difficult one. Especially due to the translation vector estimation: 2.8 meter threshold is very challenging. From my personal opinion, the car pose estimation (for translation vector) could have larger threshold if the car is far away. For the current system, no matter how far the detected cars are from the camera (e.g., 5 meters or 150 meters), the current evaluating metric treats them unanimously. However, for the future evaluation metric, I would suggest a linear threshold: if the car is close to the camera, then stricter threshold is required and if the car is far away, a more tolerant threshold is considered (as illustrated in the following image):

transcircle

Such proposed system would also make sense for real autonomous driving scenario. Hope it makes sense for the future evaluation metrics. And thank you for organizing this very fun challenge.

stevenwudi avatar Sep 10 '18 09:09 stevenwudi

Hi Di,

Thanks for your participation, The metric we consider is similar with detection, since IoU based instance segmentation is normalized by scale, so there is no difference in evaluating object nearby or further away. Originally, we want to have 3D volume IOU, but it turns out to be very time costly for evaluation.

We also tried several algorithms ourselves, it turns out when estimating depth, error can be very large. We do consider to replace the distance metric, by borrowing some idea from depth evaluation, such as absolute relative depth or squared relative depth. In this case, we have strict requirements for cars nearby while loose for car further away. However, this might generate results that the reprojected masks of cars very far away could be largely miss-aligned, thus, the threshold need to be selected smarter by taking care of similarity of reprojection mask.

Another choice is reprojection AP, that is generate a 2D instance amodal segmentation results, and compare it against gt, however, this will have scale confusion problem.

It is not a trivial problem, but re-projection mask aware relative depth might be a good choice.

Best Peng

ApolloScapeAuto avatar Sep 11 '18 06:09 ApolloScapeAuto