tapnet robotap's query points selection question

the robotap article shows a good way to use points tracked by the tapir. however, I don't find the part that shows how to make sure points across demos mean the same point on the object. how do we know two different pionts in pixel space in two different demos mean the same point on the physical object? Additional, because we need to track trajectories. We have to know which trajectory a point should track. We should match the point to a trajectory. I wonder how this is achieved. It would be appreciated if someone tells me.

Feb 24 '24 11:02 nutsintheshell

We don't. TAPIR identifies points based on appearance; it's trained on synthetic kubric data which does not have duplicate objects, so the algorithm tends to assume that similar-looking objects are, in fact, the same. This is useful for RoboTAP in our current setting, although the more general problem of dealing with similar-looking objects within the same scene is unsolved. TAPIR is likely to have false-positives in these cases.

Feb 26 '24 17:02 cdoersch

We don't. TAPIR identifies points based on appearance; it's trained on synthetic kubric data which does not have duplicate objects, so the algorithm tends to assume that similar-looking objects are, in fact, the same. This is useful for RoboTAP in our current setting, although the more general problem of dealing with similar-looking objects within the same scene is unsolved. TAPIR is likely to have false-positives in these cases.

Do you mean that we offer each point a point id and for points in different demos we match points that have the same id?do you think my understand is right?

Feb 27 '24 06:02 nutsintheshell

Maybe my expression is ambiguous. My problem is, how do we select a specific trajectory for each point tracked by online tapir when testing?in the following demo and the test time, the upper demo shows three trajectories.the point in the lower image need to choose only one trajectory to follow.I wonder how we select the trajectory. IMG_20240227_161542

IMG_20240227_162103

Feb 27 '24 08:02 nutsintheshell

I don't understand what you're asking. We have an entire section in the RoboTAP paper on how we select the points that we track at test time.

Each point is represented by a descriptor, which is the output of get_query_features() from the online model. These points are initially sampled randomly and then chosen based on their motion patterns in the demos. We can extract query features from any image and then track the same point on the same object in any new video.

Feb 27 '24 13:02 cdoersch

I don't understand what you're asking. We have an entire section in the RoboTAP paper on how we select the points that we track at test time.

Each point is represented by a descriptor, which is the output of get_query_features() from the online model. These points are initially sampled randomly and then chosen based on their motion patterns in the demos. We can extract query features from any image and then track the same point on the same object in any new video.

thanks, I understand.

Feb 27 '24 15:02 nutsintheshell

I think the question has been answered here; closing due to inactivity.

Sep 08 '24 14:09 cdoersch