Intrinsics estimation on ultrawide FOV
Hi,
Excellent work, thanks for the detailed paper and prompt model release!
I think that the method systematically underestimates focal length on ultrawide images. For example, here is an iPhone 13 Pro ultrawide image:
MoGe estimates a focal length of 1767.9873 pixels, or about 97 degrees hFOV. But the GT hFOV for this device is about 108 degrees. In testing I have done, there seems to be a pretty consistent bias to underestimate on these.
I don't know of a nice public dataset of images of this kind to show this to you. But I can produce a related behavior by taking a public dataset with known intrinsics (Cambridge Landmarks Dataset) and cropping in the vertical. As the crop gets more extreme, MoGe's estimate gets biased towards being big:
(I attached a jupyter notebook you can use to reproduce these plots: cambridge intrinsics.ipynb.zip).
BTW, it's not reproduced here, but taking a cambridge image and cropping the per-pixel pointcloud, rather than the image itself, does not reproduce this behavior. So, I think it is not a problem with the LM solver step.
I wonder if you have any comment or ideas for improving the estimation accuracy here? Is it just that these weird crops/FOVs are well outside the training distribution?
Hi! Apologies for the late reply, and thank you for providing this great experiment! Currently, MoGe is trained on data with a limited range of FOV, typically between 30 and 100 degrees. You can crop an ultra-wide image by an appropriate ratio so that the FOV of the cropped input image falls within the recommended range (45–90 degrees, where the performance is the best).
We will soon release a new model with an extended FOV range and improved image augmentation in training. I tested it with the image above and it predicts 104 degrees (the mean error of MoGe's FOV estimation is about 3 degrees). The new model will be available soon. Thanks for your patience!