image-box-overlap
image-box-overlap copied to clipboard
[ECCV 2020] Training neural networks to predict visual overlap of images, through interpretable non-metric box embeddings
Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings
Anita Rau, Guillermo Garcia-Hernando, Danail Stoyanov, Gabriel J. Brostow and Daniyar Turmukhambetov – ECCV 2020 (Spotlight presentation)
To what extent are two images picturing the same 3D surfaces? Even when this is a known scene, the answer typically requires an expensive search across scale space, with matching and geometric verification of large sets of local features. This expense is further multiplied when a query image is evaluated against a gallery, e.g. in visual relocalization. While we don’t obviate the need for geometric verification, we propose an interpretable image-embedding that cuts the search in scale space to essentially a lookup.
Neural networks can be trained to predict a vector representations for images, such that the relative camera position between pairs of images is approximated by a distance in vector space. And there are a few versions of such relations, that unfortunately are not interpretable.
We propose to capture camera position relations through normalized surface overlap (NSO). NSO measure is not symmetric, but it is interpretable.
We propose to represent images as boxes, not vectors. Two boxes can intersect, and boxes can have different volumes. The ratio of intersection over volume can be used to approximate normalized surface overlap. So, box representation allows us to model non-symmetric (non-metric) relations between pairs of images. The result is that with box embeddings we can quickly identify, for example, which test image is a close-up version of another.
Next we plot the predicted NSO relationship between a test query image and a set of test images. We say "enclosure" for NSO of query pixels visible in the retrieved image, and concentration for NSO of retrieved image pixels visible in the query image.
Finally, the predicted normalized surface overlap can be used to derive relative scale factor between a pair of images.
Subsequently, local features need only be detected at that scale. We validate our scene-specific model by showing how this embedding yields competitive image-matching results, while being simpler, faster, and also interpretable by humans.
⚙ Setup
This codebase is a minimal implementation of the training data generation and the training function using PyTorch Lightning. A minimal working Anaconda environment is provided with the codebase: environment.yml. You can install and activate a new conda environment from this file with:
conda env create -f environment.yml -n boxes
conda activate boxes
💾 MegaDepth data and splits
To run the provided scripts the MegaDepth dataset needs to be downloaded.
Once downloaded, update the fields path_sfm and path_depth with the correct paths of you machine in (each of) the dataset files on
data/dataset_jsons/megadepth/<scene name>.
We provide training, validation and test splits for the image overlap prediction task on four scenes: Big Ben, Notre Dame
Venice and Florence. You can find them on the folders
data/overlap_data/megadepth/<scene name>/. Each file (train.txt, val.txt and test.txt) contains the filenames of
pairs of images and their computed ground-truth overlap.
If you wish to generate this date yourself, check the next section.
🌍 Generating normalized surface overlap datasets
Code to generate normalized surface overlaps between pairs of MegaDepth images can be found in the Python package
src/datasets/dataset_generator. The package has two main components i)compute_normals.py and ii)compute_overlap.py.
i. compute_normals.py computes the surface normals using available depth images. The list of available for each scene
depth images can be found in data/overlap_data/megadepth/<scene name>/images_with_depth. Don't forget to update the
json paths as described above.
ii. compute_overlap.py computes the normalized surface overlap between image pairs given the surface normals from
the previous step.
For convenience we provide an example bash script in generate_dataset.sh. NOTE: Normal data is stored uncompressed and
it is about 50MB per image in average, so storage size requirement can easily escalate.
⏳ Training
To train a model run:
python -m src.train \
--name my_box_model \
--dataset_json data/dataset_jsons/megadepth/bigben.json \
--box_ndim 32 \
--batch_size 32 \
--model resnet50 \
--num_gpus 1 \
--backend dp
where box_ndim are the dimensions of the embedding space. backend is the PyTorch Lightning distributed backend which is flexible (we have only tested this implementation on dp and ddp) and can be used with different num_gpus.
We also provide a training bash script train.sh. By default tensorboard logs and models are saved on a folder with the same name as the experiment /<name>.
📊 MegaDepth evaluation
To evaluate model on surface overlap prediction and reproduce the results on the paper (Table 1) you can run:
python -m src.test \
--model_scene bigben \
--model resnet50 \
--dataset_json data/dataset_jsons/megadepth/bigben.json
or, alternatively, run the bash script provided test.sh.
🖼️ Estimating relative scale between two images estimation
For an interactive example using our models to predict the relative scale of two images you can run the following Jupyter
Notebook relative_scale_example.ipynb.
📦 Trained models on MegaDepth
| Scene | Input size and model | filesize | Link |
|---|---|---|---|
| Big Ben | 256 x 456 ResNet50 | 95 MB | Download 🔗 |
| Notre Dame | 256 x 456 ResNet50 | 95 MB | Download 🔗 |
| Venice | 256 x 456 ResNet50 | 95 MB | Download 🔗 |
| Florence | 256 x 456 ResNet50 | 95 MB | Download 🔗 |
✏️ 📄 Citation
If you find our work useful or interesting, please consider citing our paper:
@inproceedings{rau-2020-image-box-overlap,
title = {Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings},
author = {Anita Rau and
Guillermo Garcia-Hernando and
Danail Stoyanov and
Gabriel J. Brostow and
Daniyar Turmukhambetov
},
booktitle = {European Conference on Computer Vision ({ECCV})},
year = {2020}
}
👩⚖️ License
Copyright © Niantic, Inc. 2020. Patent Pending. All rights reserved. Please see the license file for terms.