nerf
nerf copied to clipboard
Coordinate system of translation
Hello and thanks for sharing this nice work! I am still a bit confused by coordinate systems and the Readme. While the Readme says that
In run_nerf.py and all other code, we use the same pose coordinate system as in OpenGL
it seems that the LLFF code which is used by image2poses.py
only transforms the rotation matrix from [right, down, forward] (COLMAP) to [down, right, backward] and NeRF later converts to [right, up, backward] (OpenGL).
- If I create
poses_bounds.npy
with own poses as suggestes in the Readme (OpenGL format), then this line would still apply the conversion to OpenGL format. That doesn't seem right to me, or am I misunderstanding something? - What about the translation? I can't find any conversions in this regard, so it seems that the translation part in
poses_bounds.npy
should still be in COLMAP format. Is that correct? - Why is the intermediate step ([d,r,b]) used in the first place and the rotation not directly converted to the OpenGL format?
Thanks in advance!
@dnlwbr did you find anything related to your above questions?
Unfortunately, it's been some time since I've dealt with that and the whole thing is also somewhat ambiguous. If I remember correctly, however, these were my findings:
- I think the documentation is not quite correct or imprecise here. NeRF uses the OpenGL format, but there is a conversion at the beginning. I think you have to create
poses_bounds.npy
in drb format, because this line still does the transformation to OpenGL format. If the code was already in OpenGL format from the beginning, as recommended by the Readme, then the mentioned line would break everything. At least my results seem to have confirmed my assumption at that time. - I think I was confused here, because the LLFF readme that the NeRF readme refers to at this point only mentions rotation. Nevertheless, I think the coordinate systems always refer to both translation and rotation. Everything else would have been somehow strange. Still, the explicit mention of the rotation matrix is somewhat misleading, in my opinion, if the complete pose (translation + rotation) is meant.
- I'm still wondering about that.
However, my statements are to be enjoyed with caution, since I am everything else than an expert.
Thanks @dnlwbr! This is helpful. I'll update here if I figure out anything more.
@dnlwbr I figured out what is happening w.r.t. your point 2 (Convention for translation isn't changed). You were right. Convention of both rotation and translation are changed. For the benefit of others who may also stumble upon this issue, I'll note down what I've understood so far.
- It's a neat trick they've used. When converting a rotation matrix
R
from one convention to other, we find the corresponding permutation matrixP
and find the new rotation matrix asP' R P
. Similarly we find the new translation asP t
. But when we compute relative poses, right multiplication byP
is unnecessary (since it cancels out). So, we can multiply only on the left byP'
. Here, they first take inverse of the camera poses then multiply rotation matrix with the permutation matrix on the right withP
. Then they compute relative pose and take the inverse of this relative pose. When we invert the pose, rotation matrix is transposed so thatR' P
becomesP' R
which is what we want. And the translation becomes-(R' P)' (-R' t)
which is equal toP' t
. Thus both rotation and translation are converted to the final convention. - I think they simply wanted to reuse LLFF code to get colmap poses to avoid duplicating it here. And then they convert to Open GL convention.