taichi_3d_gaussian_splatting
taichi_3d_gaussian_splatting copied to clipboard
about coordinate system & camera poses
Hi there, thanks first for the work, it's great. I am looking to extend some rendering features, for example rendering to a video clip with a given camera trajectory. But as so far, my rendered frames' view looks weird and it's likely the transformation to camera is wrong.
What I do for now:
- The data I am using is in instant-NGP format & coordinate system
- I first retrieve the 4x4 transformation matrix from Transform.json
- next apply a flip_x transformation to it (I notice you do this in your prepare_InstantNGP_with_mesh.py script, so I try the same way)
- then I do a TRS decomposition from transformation matrix
- at last pass R to q_pointcloud_camera and pass T to t_pointcloud_camera
Do you see any step I am missing or doing wrong? If not, I am thinking if there is coordinates mismatch, for example the coordinate handness or camera axis are defined differently. Please let me know if you have any idea or comments.
Thanks!
Hi @Leix8 ! Thanks for your feedback!
- Can you explain more about the "TRS decomposition" you did? Actually, I have no idea what it is... But basically this function is used for transformation between Transformation Matrix and (Quaternion, translation).
- You can refer to this document for the coordinate system, the camera system is with x-axis pointing right, y-axis pointing down, z-axis pointing forward. The image coordinate system is the standard pytorch image coordinate system, with origin at top left corner, x-axis pointing right, y-axis pointing down. T_pointcloud_camera means Transformation Matrix from camera coordinate to pointcloud coordinate, q_pointcloud_camera means Quaternion from camera coordinate to pointcloud coordinate, t_pointcloud_camera means translation from camera coordinate to pointcloud coordinate.
- the latest code switches from transformation matrix to (Quaternion, translation) pair for the potential feature of optimizing camera pose.
- the flip_x transformation is used to switch between different camera coordinate system, the prepare_InstantNGP_with_mesh.py script is mainly tested with BlenderNerf, and it's possible but unlikely that their implementation has bug and their NGP format is different from real NGP format... Also, rendering to a video clip would be a very nice feature, if you make it work, PRs are welcomed! Thanks!
Hi, Thanks for the feedback.
- The TRS decomposition refers to the process of extracting the Translation, Rotation and Scaling vector/matrix from the 4x4 transformation matrix, pretty much the same with your se3_to_quaternion_and_translation_torch() function, the only difference is scaling is computed seperately for x/y/z axis, in case they are not identical. But this should not cause problem in our current case.
- I carefully checked the coordiante system but still cannot make it right. The visualized camera poses are like below:
- explain of figure: orange colored cones are in 3d_gaussian_splatting coordinates, blue colored cones are in my previous instant-NGP coordinates; both data come from the same scene after same Colmap process, and the frame index are aligned.
- Ideally the orange/blue cones should align well in pairs; or if there is a coordinate convertion, I should be able to resolve it from orange/blue pairs, and the result transformation should be consistent, so that when I apply it back, the orange/blue cones can align well again. Unfortunately, the actual transformation matrix varies among cone pairs, meaning that the transformation is non-linear - which is theoritically not correct.
- Anyway, I will further investigate for this issue. If it eventually cannot be figured out, I can try different approach, which is to give up with the ready-to-use instant-NGP format rendering trajectory file, rather add trajectory generation functions and generate rendering trajectory based on your train.json and render from that.
- I will request a PR when either way is working & complete. Cheers!