vggt icon indicating copy to clipboard operation
vggt copied to clipboard

Cannot work on rigidly rotation object.

Open haonanhe opened this issue 8 months ago • 7 comments

Hi, excellent work! VGGT can work on most scenerios. But I found that it can hardly work on a videos that record a rigidly moving handheld object. As shown in the picture below, the left are sampled input views (total 38 views, white pixels are masked hand), right is the output. I found the output is not satisfying. Do you have any idea what the possible reason is?

Image

haonanhe avatar Apr 18 '25 09:04 haonanhe

Hi I guess it is due to the cropping methods, for example, something related to this:

https://github.com/facebookresearch/vggt/pull/57

Would you mind sharing the whole image set so i can have a try?

jytime avatar Apr 21 '25 05:04 jytime

Sorry for the late response. Here is my image set. frames_snackhandhold.zip

haonanhe avatar Apr 21 '25 15:04 haonanhe

Hi @haonanhe, thanks for sharing the files. I’ve verified the issue after testing—it seems the masked hand is causing confusion for the model. For instance, when I tested images rendered from a glb file in the Objaverse dataset, following a similar trajectory to you, the model worked correctly. Do you have similar images available without the hands included?

Also, from another side, probably the issue below is related to yours.

https://github.com/facebookresearch/vggt/issues/47

jytime avatar Apr 23 '25 04:04 jytime

I have tried a version that has no hand masks. It still fails. I am thinking if it's because of the large rotation angles of the object. You mentioned in the paper that VGGT can not handle such a large rotation. Image

Here is my data: frames_snackhandholdnoocclusion.zip

haonanhe avatar Apr 23 '25 11:04 haonanhe

Thanks! That's a bit weird as the trajectory of the Objaverse glb render is follows the same rotation trajectory. I will try if I can include similar objects in the training data in the next version.

jytime avatar Apr 23 '25 13:04 jytime

Would you mind sharing the Objaverse glb render? Just curious about what it's like. Is it rendered with environment light or something related to reality augmentation? Maybe there's a gap between in-the-wild objects and a synthetic dataset.

haonanhe avatar Apr 23 '25 13:04 haonanhe

Hey sorry I am not sure if I can share it here before checking its licence. Let me check if I can find other alternatives.

jytime avatar Apr 23 '25 23:04 jytime