InstantMesh Camera Augmentation in the released training code

I notice that the paper metioned "Considering that the multi-view images generated by Zero123++ may be inconsistent with their pre-defined camera poses, we also add random noise to the camera parameters before feeding them into the ViT image encoder." And when I check the code here: https://github.com/TencentARC/InstantMesh/blob/34c193cc96eebd46deb7c48a76613753ad777122/src/data/objaverse.py#L195 It takes a random degree that ranges (0,2*pi) and rotates along z axis. I think the range is a bit too big? I'm not sure is this range appropriate so can you confirm it?

May 19 '24 00:05 JINNMnm

Agree. Taking such augmentation makes the instantnerf so hard to converge...

Aug 14 '24 08:08 HaFred

I have the same question. Maybe the random noise is the one in the InstantMesh/src/model.py cameras = cameras + torch.rand_like(cameras) * 0.04 - 0.02 And we should set camera_rotation as false at the start of training? Have you trained the model successfully?

Oct 20 '24 02:10 pupiljia