Know-Your-Neighbors icon indicating copy to clipboard operation
Know-Your-Neighbors copied to clipboard

[CVPR 2024] 🏑Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

🏑Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li1 Β· Tobias Fischer1 Β· Mattia Segu1 Β· Marc Pollefeys1
Luc Van Gool1 Β· Federico Tombari2,3

1ETH ZΓΌrich Β· 2Google Β· 3Technical University of Munich

CVPR 2024

Paper PDF Project Page Hugging Face

This work presents Know-Your-Neighbors (KYN), a single-view 3D reconstruction method that disambiguates occluded scene geometry by utilizing Vision-Language semantics and spatial reasoning.

teaser

πŸ”— Environment Setup

# python virtual environment
python -m venv kyn
source kyn/bin/activate
pip install -r requirements.txt

πŸš€ Quick Start

Download our pre-trianed model and the LSeg model, put them into ./checkpoints. Then run the demo:

python scripts/demo.py --img media/example/0000.png --model_path checkpoints/kyn.pt --save_path /your/save/path

Herein --img specifies the input image path, --model_path is the model checkpoint path, and --save_path stores the resulting depth map, BEV map, as well as 3D voxel grids.

πŸ“ Dataset Setup

We use the KITTI-360 dataset and process it as follows:

  1. Register at https://www.cvlibs.net/datasets/kitti-360/index.php and download perspective images, fisheye images, raw Velodyne scans, calibrations, and vehicle poses. The required KITTI-360 official scripts & data are:
    download_2d_fisheye.zip
    download_2d_perspective.zip
    download_3d_velodyne.zip
    calibration.zip
    data_poses.zip
    
  2. Preprocess with the Python script below. It rectifies the fisheye views, resizes all images, and stores them in separate folders:
    python datasets/kitti_360/preprocess_kitti_360.py --data_path ./KITTI-360 --save_path ./KITTI-360
    
  3. The final folder structure should look like:
    KITTI-360
       β”œβ”€β”€ calibration
       β”œβ”€β”€ data_poses
       β”œβ”€β”€ data_2d_raw
       β”‚   β”œβ”€β”€ 2013_05_28_drive_0003_sync
       β”‚   β”‚   β”œβ”€β”€ image_00
       β”‚   β”‚   β”‚    β”œβ”€β”€ data_192x640
       β”‚   β”‚   β”‚    └── data_rect
       β”‚   β”‚   β”œβ”€β”€ image_01
       β”‚   β”‚   β”œβ”€β”€ image_02
       β”‚   β”‚   β”‚    β”œβ”€β”€ data_192x640_0x-15
       β”‚   β”‚   β”‚    └── data_rgb
       β”‚   β”‚   └── image_03
       β”‚   └── ...
       └── data_3d_raw
               β”œβ”€β”€ 2013_05_28_drive_0003_sync
               └── ...
    

πŸ“Š Evaluation

Quantitative Evaluation

  1. The data directory is set to ./KITTI-360 by default.
  2. Download and unzip the pre-computed GT occupancy maps into ./KITTI-360. You can also compute and store your customized GT occupancy maps by setting read_gt_occ_path: '' and specifying save_gt_occ_map_path in configs/eval_kyn.yaml.
  3. Download and unzip the object labels to ./KITTI-360.
  4. Download our pre-trianed model and the LSeg model, put them into ./checkpoints.
  5. Run the following command for evaluation:
    python eval.py -cn eval_kyn
    

Voxel Visualization

Run the following command to generate 3D voxel models on the KITTI-360 test set:

python scripts/gen_kitti360_voxel.py -cn gen_voxel

πŸ’» Training

Download the LSeg model and put it into ./checkpoints. Then run:

torchrun --nproc_per_node=<num_of_gpus> train.py -cn train_kyn

where <num_of_gpus> denotes the number of available GPUs. Models will be saved in ./result by defualt.

πŸ“° Citation

Please cite our paper if you use the code in this repository:

@inproceedings{li2024know,
      title={Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning}, 
      author={Li, Rui and Fischer, Tobias and Segu, Mattia and Pollefeys, Marc and Van Gool, Luc and Tombari, Federico},
      booktitle={CVPR},
      year={2024}
}