FoundationPose Bad performance on streaming video

Hi all, I am trying to track a custom object, which I added our own .obj model, I am using the following mask (first picture, click to watch it correctly).

When it comes to specific conditions with the same mask and same environment, the performance is nice (have a look at first gif).

However, when the object is missing at the start or dissapears while the tracking is performing, the pose starts to work poorly (take a look at second and third gif).

Thanks in advance! (All gifs are accelerated and second and third are trimmed to meet the 10 mb size).

0 (1) (1)

test_piece_1

test_piece_3 (1)

test_piece_4 (1)

Apr 09 '24 11:04 JRvilanova

Hi @JRvilanova, from my understanding, it is only logical that this does not work, since FoundationPose needs a matching object mask in order to do its pose estimation and the subsequent tracking.

I assume that you are using the run_demo.py, which runs the pose estimation on the first frame, with the corresponding segmentation mask. If, as in your case, the mask and the rbg/depth image do not match, this will result in incorrect pose estimation and tracking, as can be seen in your second gif (I reproduced the same behavior by using a misaligned segmentation mask).

You could either wait with running the pose estimation until your object is aligned with the mask, i.e. ignore all frames until your object is aligned with the mask (but this makes little sense from a practical point of view). What you may really need, is a method to create the object segmentation masks (see BOP: 2D segmentation of unseen objects (e.g. the Instance Segmentation Model for SAM-6D). The masks provided by these methods can then be used as input for the pose estimation with FoundationPose.

Regarding the disappearing objects, these two would require re-running the pose estimation after the object reappears, or running the pose estimation continuously, if the required segmentation masks are available (see the author's reply in #37).

I hope this helps and answers some of your questions. Maybe the author will have something to say as well. :grinning:

Kind regards!

Apr 09 '24 12:04 savidini

What @savidini is right.

If the object does not appear initially, it does not make sense to do pose estimation. You should first get some 2D detection/segmentation to first check if the object is there. For disappearing in the middle, if it dis-appears for too long, you'd need to rerun pose estimation. If it's short, it may just track fine.

Also, not sure if the speed in the GIF is accelerated. The motion seems pretty fast. You can increase the iter number and try again: https://github.com/NVlabs/FoundationPose/blob/97095afa5f92be7cbe584fe3254d682943173947/run_demo.py#L21

Apr 10 '24 03:04 wenbowen123