tapnet icon indicating copy to clipboard operation
tapnet copied to clipboard

Question about the evaluation metrics of the TapVid3D

Open ngoductuanlhp opened this issue 1 year ago • 2 comments

Hi team,

Thank you for you great work! I have a question regarding the evaluation code for the new TapVid3D benchmark. Specifically, I noticed that in the code, predictions prior to the query_frame of each trajectory aren't excluded, as the evaluation_weights are set to 1.0 for every timeframe. You can see this here: link to code.

In contrast, both the original TapVid benchmark and CoTracker remove these points when calculating the metrics. Could you clarify whether this is intentional or if I might be misunderstanding something in the evaluation process?

Thanks again for your time and help!

ngoductuanlhp avatar Aug 09 '24 08:08 ngoductuanlhp

Yes, this is intentional. In the original tapvid, the prediction is trivial for the query frame, since the model can simply output the query point. However, for most tapvid3d metrics, the prediction is not trivial, as the model must also output the query point's depth.

The only case where it's trivial is the "per_trajectory" scaling, which will ensure that the query point is correctly scaled. However, our intention was that "per_trajectory" should behave similarly to the other metrics in terms of which points evaluated, in order to ensure that it is strictly easier than other metrics. Therefore, we decided to keep the query points in all cases.

cdoersch avatar Aug 09 '24 19:08 cdoersch

Hi @cdoersch, Thank you for your answer. So to evaluate the tracking performance on TapVid3D, we need to track both forward direction (from the query_frame to the end frame) and backward direction (from the query_frame to to start frame of the video) to get a full trajectories. As some recent methods like CoTracker or SpatialTraker which use the chaining-window to infer for long video only track in the forward direction.

ngoductuanlhp avatar Aug 09 '24 20:08 ngoductuanlhp

Yes, this is the same as with the original TAP-Vid. For methods like CoTracker it should be straightforward to reverse the direction of the video to track backward in time.

cdoersch avatar Aug 12 '24 22:08 cdoersch

Thank you for your detailed explanation. The evaluation setting is now much clearer. I appreciate the effort you've put into creating this benchmark.

ngoductuanlhp avatar Aug 14 '24 22:08 ngoductuanlhp