r3d3
                                
                                 r3d3 copied to clipboard
                                
                                    r3d3 copied to clipboard
                            
                            
                            
                        R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras [ICCV 2023]
Project Page | Paper | Data

Abstract
Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation. Our approach iterates between geometric estimation that exploits spatial-temporal information from multiple cameras, and monocular depth refinement. We integrate multi-camera feature correlation and dense bundle adjustment operators that yield robust geometric depth and pose estimates. To improve reconstruction where geometric depth is unreliable, e.g. for moving objects or low-textured regions, we introduce learnable scene priors via a depth refinement network. We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments. Consequently, we achieve state-of-the-art dense depth prediction on the DDAD and nuScenes benchmarks.
Getting Started
- Clone the repo using the --recursiveflag
git clone --recurse-submodules https://github.com/AronDiSc/r3d3.git
cd r3d3
- Creating a new anaconda environment using the provided .yaml file
conda env create --file environment.yaml
conda activate r3d3
- Compile the extensions (takes about 10 minutes)
python setup.py install
Datasets
The datasets should be placed at data/datasets/<dataset>
DDAD
Download the DDAD dataset and place it at
data/datasets/DDAD. We use the masks provided by
SurroundDepth. Place them at data/datasets/DDAD/<scene>/occl_mask/<cam>/mask.png. The DDAD datastructure should look
as follows:
R3D3
    ├ data
        ├ datasets
            ├ DDAD
                ├ <scene>
                    ├ calibration
                        └ ....json
                    ├ point_cloud
                        └ <cam>
                            └ ....npz
                    ├ occl_mask
                        └ <cam>
                            └ ....png
                    ├ rgb
                        └ <cam>
                            └ ....png
                    
                    └ scene_....json
                └ ...
            └ ...
        └ ...
    └ ...
nuScenes
Download the nuScenes dataset and place it at
data/datasets/nuScenes. We use the provide self-occlusion
masks. Place them at
data/datasets/nuScenes/mask/<cam>.png. The nuScenes datastructure should look as follows:
R3D3
    ├ data
        ├ datasets
            ├ nuScenes
                ├ mask
                    ├ CAM_....png
                ├ samples
                    ├ CAM_...
                        └ ....jpg
                    └ LIDAR_TOP
                        └ ....pcd.bin
                ├ sweeps
                    ├ CAM_...
                        └ ....jpg
                ├  v1.0-trainval
                    └ ...
                └ ...
            └ ...
        └ ...
    └ ...
Models
VKITTI2 Finetuned Feature-Matching
Download the weights for the feature- and context-encoders as well as the GRU from here: r3d3_finetuned.ckpt. Place it at:
R3D3
    ├ data
        ├ models
            ├ r3d3
                └ r3d3_finetuned.ckpt
            └ ...
        └ ...
    └ ...
Completion Network
We provide completion network weights for the DDAD and nuScenes datasets.
| Dataset | Abs Rel | Sq Rel | RMSE | delta < 1.25 | Download | 
|---|---|---|---|---|---|
| DDAD | 0.162 | 3.019 | 11.408 | 0.811 | completion_ddad.ckpt | 
| nuScenes | 0.253 | 4.759 | 7.150 | 0.729 | completion_nuscenes.ckpt | 
Place them at:
R3D3
    ├ data
        ├ models
            ├ completion
                ├ completion_ddad.ckpt
                └ completion_nuscenes.ckpt
            └ ...
        └ ...
    └ ...
Training
Droid-SLAM Finetuning
We finetune the provided droid.pth checkpoint on VKITTI2 by using the Droid-SLAM code-base.
Completion Network
1. Generate Training Data
# DDAD
python evaluate.py \
    --config configs/evaluation/dataset_generation/dataset_generation_ddad.yaml \
    --r3d3_weights=data/models/r3d3/r3d3_finetuned.ckpt \
    --r3d3_image_size 384 640 \
    --r3d3_n_warmup=5 \
    --r3d3_optm_window=5 \
    --r3d3_corr_impl=lowmem \
    --r3d3_graph_type=droid_slam \
    --training_data_path=./data/datasets/DDAD 
# nuScenes
python evaluate.py \
    --config configs/evaluation/dataset_generation/dataset_generation_nuscenes.yaml \
    --r3d3_weights=data/models/r3d3/r3d3_finetuned.ckpt \
    --r3d3_image_size 448 768 \
    --r3d3_n_warmup=5 \
    --r3d3_optm_window=5 \
    --r3d3_corr_impl=lowmem \
    --r3d3_graph_type=droid_slam \
    --training_data_path=./data/datasets/nuScenes 
2. Completion Network Training
# DDAD
python train.py configs/training/depth_completion/r3d3_completion_ddad_stage_1.yaml
python train.py configs/evaluation/depth_completion/r3d3_completion_ddad_inf_depth.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
python train.py configs/training/depth_completion/r3d3_completion_ddad_stage_2.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
# nuScenes
python train.py configs/training/depth_completion/r3d3_completion_nuscenes_stage_1.yaml
python train.py configs/evaluation/depth_completion/r3d3_completion_nuscenes_inf_depth.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
python train.py configs/training/depth_completion/r3d3_completion_nuscenes_stage_2.yaml --arch.model.checkpoint=<path to stage 1 model>.ckpt
Evaluation
# DDAD
python evaluate.py \
    --config configs/evaluation/r3d3/r3d3_evaluation_ddad.yaml \
    --r3d3_weights data/models/r3d3/r3d3_finetuned.ckpt \
    --r3d3_image_size 384 640 \
    --r3d3_init_motion_only \
    --r3d3_n_edges_max=84 
# nuScenes
python evaluate.py \
    --config configs/evaluation/r3d3/r3d3_evaluation_nuscenes.yaml \
    --r3d3_weights data/models/r3d3/r3d3_finetuned.ckpt \
    --r3d3_image_size 448 768 \
    --r3d3_init_motion_only \
    --r3d3_dt_inter=0 \
    --r3d3_n_edges_max=72 
Citation
If you find the code helpful in your research or work, please cite the following paper.
@inproceedings{r3d3,
  title={R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras},
  author={Schmied, Aron and Fischer, Tobias and Danelljan, Martin and Pollefeys, Marc and Yu, Fisher},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  year={2023}
}
Acknowledgements
- This repository is based on Droid-SLAM.
- The implementation of the completion network is based on Monodepth2.
- The vidar framework is used for training, evaluation and logging results.