What is the relationship between mvsplat and unimatch?

Open dodododddo opened this issue 1 year ago • 1 comments

Hello! Excellent work! I am trying to understand more details about depth estimation in MVSplat from the code. MVSplat supports handling cases with more than three views, but its depth estimation module relies on the Unimatch framework (even using its pre-trained weights for initialization). However, Unimatch seems to only support handling two-view cases (at least based on its official implementation code). How does MVSplat enable Unimatch to support more views? Is it possible to provide code for a depth Unimatch module that supports multiple views? If I have misunderstood the relationship between MVSplat and Unimatch or how they work, please kindly correct me. Thank you!

Dec 17 '24 16:12 dodododddo

Hi @dodododddo, thanks for your appreciation.

Our multi-view Transformer is adopted from UniMatch. To extend from two-view to multi-view, we set the attention K, V as N-1 views rather than just one view. This essentially changes the token length of K, V from H*W*1 to H*W*(N-1), which does not affect any trainable parameters and allows us to use the two-view UniMatch pre-trained weight. For more details, you can compare the implementations between two-view attention and multi-view attention.

Dec 19 '24 02:12 donydchen