Results on Tanks & Temples Dataset
Hi, thanks for the wonderful job again.
I first conducted tests on the Mip-360 dataset, and the experimental results completely matched those in the paper and README. Subsequently, I tested the deep blending and tanks & Templates datasets used in 3D-GS.
The results for the DB (Deep Blending) dataset are as shown in the following table, and they represent the state-of-the-art rendering quality:
| drjohnson | playroom | |
|---|---|---|
| PSNR | 30.150 | 30.946 |
| SSIM | 0.90905 | 0.90808 |
| LPIPS (VGG) | 0.21985 | 0.20522 |
https://github.com/jonbarron/camp_zipnerf/assets/63096187/37780da4-50ee-4d33-b9d0-dc8a38bdb996
https://github.com/jonbarron/camp_zipnerf/assets/63096187/1f83c7eb-07fa-4df3-837c-cfaef05a8870
However, when running the Tanks & Temples dataset, ZipNeRF appeared unable to converge, leading to highly fragmented outcomes. Theoretically, given that the format of the Tanks dataset is identical to that of DB, and considering the perfect results obtained on DB, this implies that my data loading process should be right. I am uncertain whether this issue pertains to the robustness of the ZipNeRF method itself or if the Tanks dataset necessitates a different configuration to achieve convergence.
I executed the following command:
DATA_DIR=/my/path/to/the/dataset
CHECKPOINT_DIR=./logs/zipnerf/tanks_db
SCENE=truck # train, drjohnson and playroom are in the same way
CUDA_VISIBLE_DEVICES=0 python -m train \
--gin_configs=configs/zipnerf/360.gin \
--gin_bindings="Config.data_dir = '${DATA_DIR}/${SCENE}'" \
--gin_bindings="Config.checkpoint_dir = '${CHECKPOINT_DIR}/${SCENE}'" \
--gin_bindings="Config.factor = 1"
Looking forward to your reply!
https://github.com/jonbarron/camp_zipnerf/assets/63096187/0c091146-1225-4cd2-aa95-3815dd6e010e
https://github.com/jonbarron/camp_zipnerf/assets/63096187/74f87a66-54aa-4347-95f3-ffae3e3c40af
I have test truck dataset and it run correctly. The result is excellent. Please re run colmap on that dataset.
I'm not sure what's going on here, but in general I've found the Tanks & Temples dataset to be really dicey. It has a lot of per-image exposure and white balance variation which makes training and evaluation really hard (the mip-NeRF 360 Appendix has a whole section about how much of an issue this is). You can probably get much better results using the 360_aglo128.gin config, which should help deal with the per-image variation. That said, judging from the videos you shared here it seems like your camera poses might be wrong? Not sure.
Zipnerf with truck dataset. We can get the dataset from Inria GS sample dataset.
input dataset: 251 images (resolution = 979x546)
Steps = 30k steps PSNR at 30k = 28,74
Training time = +- 2 hours 30 minutes Rendering time (480 frames) = +- 1 hour
Training GPU usage = 47 GB with 16384 batch size Rendering GPU usage = 36 GB
GPU type = RTX A6000 48GB VRAM. (renting from Runpod)
Zipnerf is SOTA of Quality. But, it has cons..
I don't know how to change camera trajectory for rendering video. Maybe we can import nerfstudio camera path for rendering?
FYI, I cant upload the video to this thread since it larger than 10 MB. So I upload some frames.
Wow, wonderful results on truck. Did you use the camera pose provided here. I think the camera poses for this DB and tanks&Temples are relatively accurate, I can get good results directly from them using 3D-GS. And what is the command you ran to get the results?
Thanks for your reply~
Hey, nice looking truck! Yeah there's no easy support for changing the render path in this codebase. Integrating into Nerfstudio's GUI would be cool.
Wow, wonderful results on
truck. Did you use the camera pose provided here. I think the camera poses for this DB and tanks&Temples are relatively accurate, I can get good results directly from them using 3D-GS. And what is the command you ran to get the results?Thanks for your reply~
Sorry for late response
./configs/zipnerf/360.gin
# But to allow this code to run on a single just-okay GPU, we've divided the
# batch size and learning rates by 8x and multiplied the number of steps by 8x.
Config.max_steps = 30000
Config.batch_size = 16384
Config.lr_delay_steps = 20000
Config.lr_init = 0.0025
Config.lr_final = 0.00025
python3 -m train \
--gin_configs=configs/zipnerf/360-fac1-batchsize025.gin \
--gin_configs=configs/camp/camera_optim.gin \
--gin_bindings="Config.data_dir = '{DATA_DIR}/{SCENE}'" \
--gin_bindings="Config.checkpoint_dir = '{CHECKPOINT_DIR}/{SCENE}'"
python3 -m render \
--gin_configs=configs/zipnerf/360-fac1-batchsize025.gin \
--gin_configs=configs/camp/camera_optim.gin \
--gin_bindings="Config.data_dir = '{DATA_DIR}/{SCENE}'" \
--gin_bindings="Config.checkpoint_dir = '{CHECKPOINT_DIR}/{SCENE}'" \
--gin_bindings="Config.render_dir = '{CHECKPOINT_DIR}/RENDER/Truck_PLY'" \
--gin_bindings="Config.render_path = True" \
--gin_bindings="Config.render_path_frames = 480" \
--gin_bindings="Config.render_video_fps = 30"
Thanks so much to Barron @jonbarron and Ichsan @ichsan2895 for their timely, patient, and detailed responses.
Following Ichsan's experimental setup, I still obtained blurry results. Thus, my conclusion is that the inaccuracy of the camera pose in the official 3D-GS tandt dataset is causing the collapse of zipnerf. When we re-run the COLMAP according to Ichsan's guidance, we can obtain good results on tandt using zipnerf.