Know-Your-Neighbors
Know-Your-Neighbors copied to clipboard
[CVPR 2024] π‘Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
π‘Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li1 Β· Tobias Fischer1 Β· Mattia Segu1 Β· Marc Pollefeys1
Luc Van Gool1 Β· Federico Tombari2,3
1ETH ZΓΌrich Β· 2Google Β· 3Technical University of Munich
CVPR 2024
This work presents Know-Your-Neighbors (KYN), a single-view 3D reconstruction method that disambiguates occluded scene geometry by utilizing Vision-Language semantics and spatial reasoning.
π Environment Setup
# python virtual environment
python -m venv kyn
source kyn/bin/activate
pip install -r requirements.txt
π Quick Start
Download our pre-trianed model and the LSeg model, put them into ./checkpoints
. Then run the demo:
python scripts/demo.py --img media/example/0000.png --model_path checkpoints/kyn.pt --save_path /your/save/path
Herein --img
specifies the input image path, --model_path
is the model checkpoint path, and --save_path
stores the resulting depth map, BEV map, as well as 3D voxel grids.
π Dataset Setup
We use the KITTI-360 dataset and process it as follows:
- Register at https://www.cvlibs.net/datasets/kitti-360/index.php and download perspective images, fisheye images, raw Velodyne scans, calibrations, and vehicle poses. The required KITTI-360 official scripts & data are:
download_2d_fisheye.zip download_2d_perspective.zip download_3d_velodyne.zip calibration.zip data_poses.zip
- Preprocess with the Python script below. It rectifies the fisheye views, resizes all images, and stores them in separate folders:
python datasets/kitti_360/preprocess_kitti_360.py --data_path ./KITTI-360 --save_path ./KITTI-360
- The final folder structure should look like:
KITTI-360 βββ calibration βββ data_poses βββ data_2d_raw β βββ 2013_05_28_drive_0003_sync β β βββ image_00 β β β βββ data_192x640 β β β βββ data_rect β β βββ image_01 β β βββ image_02 β β β βββ data_192x640_0x-15 β β β βββ data_rgb β β βββ image_03 β βββ ... βββ data_3d_raw βββ 2013_05_28_drive_0003_sync βββ ...
π Evaluation
Quantitative Evaluation
- The data directory is set to
./KITTI-360
by default. - Download and unzip the pre-computed GT occupancy maps into
./KITTI-360
. You can also compute and store your customized GT occupancy maps by settingread_gt_occ_path: ''
and specifyingsave_gt_occ_map_path
inconfigs/eval_kyn.yaml
. - Download and unzip the object labels to
./KITTI-360
. - Download our pre-trianed model and the LSeg model, put them into
./checkpoints
. - Run the following command for evaluation:
python eval.py -cn eval_kyn
Voxel Visualization
Run the following command to generate 3D voxel models on the KITTI-360 test set:
python scripts/gen_kitti360_voxel.py -cn gen_voxel
π» Training
Download the LSeg model and put it into ./checkpoints
. Then run:
torchrun --nproc_per_node=<num_of_gpus> train.py -cn train_kyn
where <num_of_gpus>
denotes the number of available GPUs. Models will be saved in ./result
by defualt.
π° Citation
Please cite our paper if you use the code in this repository:
@inproceedings{li2024know,
title={Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning},
author={Li, Rui and Fischer, Tobias and Segu, Mattia and Pollefeys, Marc and Van Gool, Luc and Tombari, Federico},
booktitle={CVPR},
year={2024}
}