VI-Depth
VI-Depth copied to clipboard
Code for Monocular Visual-Inertial Depth Estimation (ICRA 2023)
Monocular Visual-Inertial Depth Estimation
This repository contains code and models for our paper:
Monocular Visual-Inertial Depth Estimation
Diana Wofk, René Ranftl, Matthias Müller, Vladlen Koltun
For a quick overview of the work you can watch the short talk and teaser on YouTube.
Introduction

We present a visual-inertial depth estimation pipeline that integrates monocular depth estimation and visual-inertial odometry to produce dense depth estimates with metric scale. Our approach consists of three stages: (1) input processing, where RGB and IMU data feed into monocular depth estimation alongside visual-inertial odometry, (2) global scale and shift alignment, where monocular depth estimates are fitted to sparse depth from VIO in a least-squares manner, and (3) learning-based dense scale alignment, where globally-aligned depth is locally realigned using a dense scale map regressed by the ScaleMapLearner (SML). The images at the bottom in the diagram above illustrate a VOID sample being processed through our pipeline; from left to right: the input RGB, ground truth depth, sparse depth from VIO, globally-aligned depth, scale map scaffolding, dense scale map regressed by SML, final depth output.

Setup
-
Setup dependencies:
conda env create -f environment.yaml conda activate vi-depth -
Pick one or more ScaleMapLearner (SML) models and download the corresponding weights to the
weightsfolder.Depth Predictor SML on VOID 150 SML on VOID 500 SML on VOID 1500 DPT-BEiT-Large model model model DPT-SwinV2-Large model model model DPT-Large model model model DPT-Hybrid model* model model DPT-SwinV2-Tiny model model model DPT-LeViT model model model MiDaS-small model model model *Also available with pretraining on TartanAir: model
Inference
-
Place inputs into the
inputfolder. An input image and corresponding sparse metric depth map are expected:input ├── image # RGB image │ ├── <timestamp>.png │ └── ... └── sparse_depth # sparse metric depth map ├── <timestamp>.png # as 16b PNG └── ...The
load_sparse_depthfunction inrun.pymay need to be modified depending on the format in which sparse depth is stored. By default, the depth storage method used in the VOID dataset is assumed. -
Run the
run.pyscript as follows:DEPTH_PREDICTOR="dpt_beit_large_512" NSAMPLES=150 SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt" python run.py -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATH --save-output -
The
--save-outputflag enables saving outputs to theoutputfolder. By default, the following outputs will be saved per sample:output ├── ga_depth # metric depth map after global alignment │ ├── <timestamp>.pfm # as PFM │ ├── <timestamp>.png # as 16b PNG │ └── ... └── sml_depth # metric depth map output by SML ├── <timestamp>.pfm # as PFM ├── <timestamp>.png # as 16b PNG └── ...
Evaluation
Models provided in this repo were trained on the VOID dataset.
-
Download the VOID dataset following the instructions in the VOID dataset repo.
-
To evaluate on VOID test sets, run the
evaluate.pyscript as follows:DATASET_PATH="/path/to/void_release/" DEPTH_PREDICTOR="dpt_beit_large_512" NSAMPLES=150 SML_MODEL_PATH="weights/sml_model.dpredictor.${DEPTH_PREDICTOR}.nsamples.${NSAMPLES}.ckpt" python evaluate.py -ds $DATASET_PATH -dp $DEPTH_PREDICTOR -ns $NSAMPLES -sm $SML_MODEL_PATHResults for the example shown above:
Averaging metrics for globally-aligned depth over 800 samples Averaging metrics for SML-aligned depth over 800 samples +---------+----------+----------+ | metric | GA Only | GA+SML | +---------+----------+----------+ | RMSE | 191.36 | 142.85 | | MAE | 115.84 | 76.95 | | AbsRel | 0.069 | 0.046 | | iRMSE | 72.70 | 57.13 | | iMAE | 49.32 | 34.25 | | iAbsRel | 0.071 | 0.048 | +---------+----------+----------+To evaluate on VOID test sets at different densities (void_150, void_500, void_1500), change the
NSAMPLESargument above accordingly.
Citation
If you reference our work, please consider citing the following:
@inproceedings{wofk2023videpth,
author = {{Wofk, Diana and Ranftl, Ren\'{e} and M{\"u}ller, Matthias and Koltun, Vladlen}},
title = {{Monocular Visual-Inertial Depth Estimation}},
booktitle = {{IEEE International Conference on Robotics and Automation (ICRA)}},
year = {{2023}}
}
Acknowledgements
Our work builds on and uses code from MiDaS, timm, and PyTorch Lightning. We'd like to thank the authors for making these libraries and frameworks available.