TrackEval
TrackEval copied to clipboard
How to do “Matching to Optimise HOTA”
It is easy to calculate hota metrics after detection/gt match. But it is quite confusing about matching rule. From the paper, hota is calculated independently under different iou threshold for 2d bbox tracking. And the Eq15 is to do Hungarian algorithm on Amax add weighted iou. But it is some kind of Amax multiply iou in this repo. It is hard to understand why it's guaranteed to meet maximum hota on all possible detection/gt match, both on paper and code. For a tiny example on iou 0.5 threshold:
data['gt_ids'] = [np.array([0])] * 9 + [np.array([0,1])]
data['tracker_ids'] = [np.array([0])] * 10
data['similarity_scores'] = [np.array([[1]])] * 9 + [np.array([0.49,0.51]).reshape(2,1)]
data['num_tracker_ids'] = 1
data['num_gt_ids'] = 2
In the example, the code will discard last frame detection under iou 0.5 threshold, DetA 0.75 AssA 0.818 HOTA 0.783 But just let last frame detection to match gt id 1: DetA 0.909 AssA 0.746 HOTA 0.8237
Just do linear_sum_assignment(-similarity) to reproduce.
I agree with this - how come there is such a huge discrepancy between the method used in the paper and the one in the repo? I would appreciate some more clarity on this, as it is quite a crucial part of the metric
I agree. I'm currently trying to understand what's going on in the paper by cross-referencing it to the implementation in this repo and it's been very difficult to make sense of any of it, especially because i started with the assumption that the implementation should be equivalent to the description in the paper.
It would be very helpful to at least have comments in the source code that note the parts that are different from the paper.