ruili3/Know-Your-Neighbors: [CVPR 2024] 🏡Know Your Neighbors: Improving Singl...

🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li¹ · Tobias Fischer¹ · Mattia Segu¹ · Marc Pollefeys¹
Luc Van Gool¹ · Federico Tombari^2,3

¹ETH Zürich · ²Google · ³Technical University of Munich

CVPR 2024

This work presents Know-Your-Neighbors (KYN), a single-view 3D reconstruction method that disambiguates occluded scene geometry by utilizing Vision-Language semantics and spatial reasoning.

teaser

🔗 Environment Setup

# python virtual environment
python -m venv kyn
source kyn/bin/activate
pip install -r requirements.txt

🚀 Quick Start

Download our pre-trianed model and the LSeg model, put them into ./checkpoints. Then run the demo:

python scripts/demo.py --img media/example/0000.png --model_path checkpoints/kyn.pt --save_path /your/save/path

Herein --img specifies the input image path, --model_path is the model checkpoint path, and --save_path stores the resulting depth map, BEV map, as well as 3D voxel grids.

📁 Dataset Setup

We use the KITTI-360 dataset and process it as follows:

Register at https://www.cvlibs.net/datasets/kitti-360/index.php and download perspective images, fisheye images, raw Velodyne scans, calibrations, and vehicle poses. The required KITTI-360 official scripts & data are:
```
download_2d_fisheye.zip
download_2d_perspective.zip
download_3d_velodyne.zip
calibration.zip
data_poses.zip
```
Preprocess with the Python script below. It rectifies the fisheye views, resizes all images, and stores them in separate folders:
```
python datasets/kitti_360/preprocess_kitti_360.py --data_path ./KITTI-360 --save_path ./KITTI-360
```

The final folder structure should look like:

KITTI-360
   ├── calibration
   ├── data_poses
   ├── data_2d_raw
   │   ├── 2013_05_28_drive_0003_sync
   │   │   ├── image_00
   │   │   │    ├── data_192x640
   │   │   │    └── data_rect
   │   │   ├── image_01
   │   │   ├── image_02
   │   │   │    ├── data_192x640_0x-15
   │   │   │    └── data_rgb
   │   │   └── image_03
   │   └── ...
   └── data_3d_raw
           ├── 2013_05_28_drive_0003_sync
           └── ...

📊 Evaluation

Quantitative Evaluation

The data directory is set to ./KITTI-360 by default.
Download and unzip the pre-computed GT occupancy maps into ./KITTI-360. You can also compute and store your customized GT occupancy maps by setting read_gt_occ_path: '' and specifying save_gt_occ_map_path in configs/eval_kyn.yaml.
Download and unzip the object labels to ./KITTI-360.
Download our pre-trianed model and the LSeg model, put them into ./checkpoints.
Run the following command for evaluation:
```
python eval.py -cn eval_kyn
```

Voxel Visualization

Run the following command to generate 3D voxel models on the KITTI-360 test set:

python scripts/gen_kitti360_voxel.py -cn gen_voxel

💻 Training

Download the LSeg model and put it into ./checkpoints. Then run:

torchrun --nproc_per_node=<num_of_gpus> train.py -cn train_kyn

where <num_of_gpus> denotes the number of available GPUs. Models will be saved in ./result by defualt.

📰 Citation

Please cite our paper if you use the code in this repository:

@inproceedings{li2024know,
      title={Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning}, 
      author={Li, Rui and Fischer, Tobias and Segu, Mattia and Pollefeys, Marc and Van Gool, Luc and Tombari, Federico},
      booktitle={CVPR},
      year={2024}
}

Know-Your-Neighbors
Know-Your-Neighbors copied to clipboard

Metadata

🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

🔗 Environment Setup

🚀 Quick Start

📁 Dataset Setup

📊 Evaluation

Quantitative Evaluation

Voxel Visualization

💻 Training

📰 Citation

← Metadata

Owner

Metadata

Know-Your-Neighbors Know-Your-Neighbors copied to clipboard

Metadata

🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

🔗 Environment Setup

🚀 Quick Start

📁 Dataset Setup

📊 Evaluation

Quantitative Evaluation

Voxel Visualization

💻 Training

📰 Citation

← Metadata

Owner

Metadata

Know-Your-Neighbors
Know-Your-Neighbors copied to clipboard