I just added the four test scenes from Figure 9 (airplants, pond, fern, t-rex) to the google drive supplement, you can find them here now:
I just added the four test scenes from Figure 9 (airplants, pond, fern, t-rex) to the google drive supplement, you can find them here now: https://drive.google.com/open?id=1Xzn-bRYhNE5P9N7wnwLDXmo37x7m3RsO
Here's an explanation of the poses_bounds.npy file format. This file stores a numpy array of size Nx17 (where N is the number of input images). You can see how that is loaded in the three lines here. Each row of length 17 gets reshaped into a 3x5 pose matrix and 2 depth values that bound the closest and farthest scene content from that point of view.
The pose matrix is a 3x4 camera-to-world affine transform concatenated with a 3x1 column [image height, image width, focal length] along axis=1.
The rotation (first 3x3 block in the camera-to-world transform) is stored in a somewhat unusual order, which is why there are the transposes. From the point of view of the camera, the three axes are
[ down, right, backwards ]
which some people might consider to be [-y,x,z].
So the steps to reproduce this should be (if you have a set of 3x4 poses for your images, plus focal lengths and close/far depth bounds):
- Make sure your poses are in camera-to-world format, not world-to-camera.
- Make sure your rotation matrices have the columns in the same order I use (downward, right, backwards).
- Concatenate each pose with the [height, width, focal] vector to get a 3x5 matrix.
- Flatten each of those into 15 elements and concatenate the close/far depths.
- Concatenate each 17d vector to get a Nx17 matrix and use np.save to store it as
poses_bounds.npy.
Hopefully that helps explain my pose processing after colmap. Let me know if you have any more questions.
Originally posted by @bmild in https://github.com/Fyusion/LLFF/issues/10#issuecomment-514406658
Still have a question about Rt, since I use opencv Camera Cooradinate([right, down, forward]) as source pose, then for meeting the requirements of [ down, right, backwards ]
which some people might consider to be [-y,x,z]. I have to switch x and y, then negative z, but how about T(3x1).
Below it is OpenCV Camera Cooradinate.
Looking forward to your reply~