FoundationPose icon indicating copy to clipboard operation
FoundationPose copied to clipboard

When the resize parameter is not 1, the prediction results are inaccurate

Open tetanus-sy opened this issue 4 months ago • 5 comments

I modified run_linemod.py slightly and predicted my own dataset. My initial resolution was 2400×1800. Initially, I scaled the entire dataset to 640×480. Using resize=1 worked well, with angular deviations within 2° and translation errors within 2mm. The results are as follows:

Image

However, when I tried to predict directly on the 2400×1800 image, I ran out of video memory, so I set resize to 0.33. The predicted results were significantly different. While the BBox was correct, the rotation and translation errors were significant, with a rotation angle difference of 150° and a translation error of 641mm. The results are as follows:

Image

How can I solve this problem? Thank you.

tetanus-sy avatar Aug 20 '25 03:08 tetanus-sy

hi did you have to retrain the model ? if not how did you get it running ? I dont have depth images like the ones in the mustard0.zip and I am not sure how to run it for my custom object .

Manu752 avatar Aug 27 '25 14:08 Manu752

hi did you have to retrain the model ? if not how did you get it running ? I dont have depth images like the ones in the mustard0.zip and I am not sure how to run it for my custom object .

No, I did not retrain the model. Instead, I used its public weights directly for inference prediction. I wrote a similar inference program based on the run_linemod.py program with slight modifications and ran it with doceker on the wsl system. The dataset was created using ObjectDatasetTools. In fact, for inference, only object model related data, RGB, mask and depth data are needed.

tetanus-sy avatar Sep 01 '25 01:09 tetanus-sy

Hi Tetanus i want to use foundationpose on my high resoultion images aswell. I hope to increase the already ok accuracy. However i find it hard to findout what to modify to get it going. Would it be possible to write a little bit about what exactly you did modify? Which files etc.? Or did you find any good explainations about it online?

BunteStadt avatar Sep 15 '25 09:09 BunteStadt

你好,Tetanus, 我也想在我的高分辨率图像上使用 FoundationPose。 我希望提高已经不错的精度。但是我很难找到需要修改的地方。 能否简单描述一下你具体修改了哪些文件等等?或者你在网上找到过什么好的解释吗?

Hi BunteStadt I haven't modified the network itself; I've only written an inference program based on its inference code. I'm currently facing the same problem and need to modify the network to improve its accuracy on the task. My initial approach is to modify the refinement network and pose selection, then freeze the backbone. I've created a small dataset and will later train on the refinement network to see if this improves accuracy. If you don't mind, you can add me on Instagram: tetanus-sy for follow-up discussions on this issue.

tetanus-sy avatar Sep 16 '25 07:09 tetanus-sy

There is a bug in datareader: https://github.com/NVlabs/FoundationPose/blob/main/datareader.py#L198

change this: K[:2,:2] *= self.resize

to this: K[:2] *= self.resize

AnnaMikestikova avatar Sep 30 '25 15:09 AnnaMikestikova