Question about the evaluation metrics of the TapVid3D
Hi team,
Thank you for you great work! I have a question regarding the evaluation code for the new TapVid3D benchmark. Specifically, I noticed that in the code, predictions prior to the query_frame of each trajectory aren't excluded, as the evaluation_weights are set to 1.0 for every timeframe. You can see this here: link to code.
In contrast, both the original TapVid benchmark and CoTracker remove these points when calculating the metrics. Could you clarify whether this is intentional or if I might be misunderstanding something in the evaluation process?
Thanks again for your time and help!
Yes, this is intentional. In the original tapvid, the prediction is trivial for the query frame, since the model can simply output the query point. However, for most tapvid3d metrics, the prediction is not trivial, as the model must also output the query point's depth.
The only case where it's trivial is the "per_trajectory" scaling, which will ensure that the query point is correctly scaled. However, our intention was that "per_trajectory" should behave similarly to the other metrics in terms of which points evaluated, in order to ensure that it is strictly easier than other metrics. Therefore, we decided to keep the query points in all cases.
Hi @cdoersch, Thank you for your answer. So to evaluate the tracking performance on TapVid3D, we need to track both forward direction (from the query_frame to the end frame) and backward direction (from the query_frame to to start frame of the video) to get a full trajectories. As some recent methods like CoTracker or SpatialTraker which use the chaining-window to infer for long video only track in the forward direction.
Yes, this is the same as with the original TAP-Vid. For methods like CoTracker it should be straightforward to reverse the direction of the video to track backward in time.
Thank you for your detailed explanation. The evaluation setting is now much clearer. I appreciate the effort you've put into creating this benchmark.