Any tips for training on my own dataset / fixing scale from COLMAP?
Hi there,
First of all, thanks for releasing the code for this impressive method!
I have been trying to train this on my own dataset (it's front-facing).
After a bit of experimentation, I can get ok-ish results. To do this, I extracted the poses from COLMAP then used imgs2poses.py in the LLFF repo to obtain the poses_bounds.npy for the 16 cameras in my dataset. Then, I created a .mp4 video for each camera to get the dataset into an identical format as per neural_3d in your repo.
From there, the only change I needed to make to get training to work was to expand the grid_bounds in neural_3d_z_plane.yaml from:
# Grid bounds
aabb: [[-1.0, -1.0, -1.0], [1.0, 1.0, 1.0]]
to:
# Grid bounds
aabb: [[-5.0, -5.0, -5.0], [5.0, 5.0, 5.0]]
Then I get OK results.
So it seems like COLMAP estimated my scene to have a somewhat larger scale than in the neural_3d scenes.
I am wondering if based on this, you had any ideas as to how I can improve performance to be closer to the level of neural_3d? I just have the feeling the code is optimised for a (-1,1) aabb box. But digging into the code, COLMAP, etc., it's not super obvious how I can rescale my scene, or what I should change in hyperreel to make it better suit my scene.
Just thought I would share this result, and yeah if you have any tips/ideas I'd really appreciate it! I'll also share any progress I manage to make on my own.
Hello, I'm not skilled on this project (yet :-), I have no opinion about the improvement a rescale would give. But you can easily rescale the poses and bounds "by hand" in python:
import numpy as np
scale = 0.2 # [-5., 5.] to [-1., 1.]
poses_bounds = np.load('poses_bounds.npy')
bounds = poses_bounds[:,-2:] # min,max are the last two items of each line
bounds *= scale
poses_matrices = poses_bounds[:,:15].reshape(-1,3,5) # other values are in 3x5 shape
poses_matrices[:,:,3] *= scale # the position of the cameras are the fourth column
# put back the scaled values to the poses_bounds file:
poses_bounds[:,-2:] = bounds
poses_bounds[:,:15] = poses_matrices.reshape(-1,15)
np.save('poses_bounds_rescaled.npy', poses_bounds)
Hope this helps.
@ookey thanks a lot! That is indeed very helpful! Will report back if this leads to improvements.
@nlml Everything you're doing sounds pretty reasonable!
Quick question: what dataset class / config are you using? Is NDC (normalized device coordinates) enabled? If not, and if your dataset is forward-facing, I would recommend enabling NDC (as is done here). Provided that your dataset is forward facing, I'd also suggest that you make use of a slightly smaller initial AABB (perhaps [-2, 2] in all dimensions).
@breuckelen Thanks for getting back to me!
Basically the best results that I've gotten so far are just running run_one_n3d.sh, which has NDC. The only changes made are the image size, and the aabb to aabb: [[-5.0, -5.0, -5.0], [5.0, 5.0, 5.0]].
I tried downscaling per @ookey 's suggestions yesterday, but still aabb of -5 to 5 seems to give the best validation PSNR / visual results (which aren't bad, PSNR is around 27 I believe, but I'm sure it can improve).
I was thinking about this yesterday though, and maybe the forward-facing paradigm isn't the right one for my dataset. I basically have a human subject with a completely white background. Although all the images of the subject are basically within about 120 degrees of frontal rotation (i.e., the back of the head is never completely visible), I suppose you could think about the scene as an object-centric scene with 360 degree cameras going around the object, and no background, except in my case we are missing the back camera angles.
Perhaps, given this info about my dataset, would you suggest a different config maybe? Like for instance, I'm not sure if scene contraction really makes sense given there are no far-away background objects or scene.
It's hard for me to say for sure what the best approach is (whether or not to use configs tailored to forward-facing scenes). But if you'd like to poke around, the donerf model configs might be a good place to start (e.g. donerf_sphere.yaml, donerf_cylinder.yaml, etc.). The donerf models all use space contraction by default, but this should be straightforward to disable -- simply comment out this config entry.
Honestly, though, it sounds as though the forward facing configs, such as the N3D config, should work, provided that the human subject occupies a limited field of view within the central cameras of your rig.
One other thing that might help: if you haven't already you can set the white_bg variable to 1 in the model config. This makes it so that the default background color is white, and any rays that don't "pick up" any content within the volume are automatically colored white.
Also I'm more than happy to take a look at any results if you'd like to share them, which might make it simpler to diagnose any issues / suggest potential fixes.
Thanks again for your all your comments/advice.
I tried the white_bg=1 but it actually made things worse.
I've managed to improve my best validation PSNR from about 28 to about 30 through the following changes:
-
spherical_posestoTruegave a small but noticeable improvement -
correct_posestoFalsegave a tiny improvement - Switching the regularizer config to
tv_4000_llfffromtv_4000gave the biggest improvement.
This last change lined up with my theory that what I needed was more regularisation, as in experiments with DNeRF, this turned out to be important for my dataset.
Off the top of your head, would you have any suggestions as to which parameters I could change to increase the regularisation strength or decrease the complexity of the model/representation? I think this might help me. In parallel I will re-read the paper/code in more detail to try to figure this out for myself.
I have to check with the dataset owner re sharing images of the current results I'm getting, but I will if I can. Thanks again for your help!