Stereo
Hi,
Thanks for the truly amazing work.
I was wondering if you have any plan to support stereo images in the future, to get even more precise depth estimation leveraging the added information.
I have a dataset of stereo images, and even though running MoGe on just one of the images already gives me pretty useful results, i want to try and improve them even further utilising the other image.
thanks
Hi, thank you for your interest! Your idea sounds solid and straightforward! We could consider extending the self-attention mechanism or adding cross-attention layers in the ViT to enable multi-image inputs and adopt similar end-to-end training. However, collecting stereo data and reformulating the model would require significant effort. I believe it would take us another paper to develop a well-grounded solution. We will consider it in future research. Thanks again for your suggestion!