RM-Depth
RM-Depth copied to clipboard
RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes, CVPR 2022
RM-Depth
This repository (https://github.com/twhui/RM-Depth) is the offical project page for my paper RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes published in CVPR 2022. The up-to-date version of the paper is available on arXiv. The supplementary material is available here.
![]()
Overview

Major contributions: (1) Recurrent modulation units (RMU) are proposed to adaptively and iteratively combine encoder and decoder features. (2) Residual upsampling is proposed for fast and efficient resizing of feature maps while sharp depth can be resulted. (3) A warping-based network is proposed to estimate a motion field of moving objects without using semantic priors. The motion field is further regularized by an outlier-aware training loss.
Despite the depth model just uses a single image in test time and 2.97M parameters, it achieves state-of-the-art results on the KITTI and Cityscapes benchmarks (AbsRel = 0.107 and 0.090, respectively). Besides, It can run at 40FPS (image size: 640 x 192) on a NVIDIA 1080 GPU.
Recurrent Modulation Unit (RMU)

Fusion of feature maps across encoder and decoder often appears in depth estimation. In RM-Depth, the depth decoder consists of RMUs. The fusion is iteratively refined by adaptive modulating the encoder features using the hidden state of RMU. This in turn improves the performance of single-image depth inference.
Residual upsampling
Conventionally, feature maps are upsampled using a single set of filters. In this work, multiple sets of filters are proposed such that each set of them is specifically trained for upsampling some of the spectral components. This effectively improves upsampling along edges.
Motion Network

Depth Prediction Results
| Semantic Prior | KITTI Testing Set (Eigen split) | Cityscapes Testing Set | Model Size (M) | |
|---|---|---|---|---|
| Monodepth2 (ICCV19) | 0.115 | - | 14.84 | |
| PackNet (CVPR20) | 0.111 | - | 128.29 | |
| Lee et al. (ICCV21) | • | 0.114 | 0.116 | 22.77 |
| Lee et al. (AAAI21) | • | 0.112 | 0.111 | 14.84 |
| RM-Depth (CVPR22), updated results |
0.107 (trained on K) (predictions), 0.105 (trained on CS+K) (predictions) |
0.090 (predictions) | 2.97 | |
| RM-Depth (CVPR22), 1024 x 320 |
0.106 (predictions) | 0.088 (predictions) | 2.97 |
Code Package
Please contact Dr. T.-W. Hui (e-mail provided in the first page of the paper) for academic research or commerical collaborations.
License and Citation
This software and associated documentation files (the "Software"), and the research paper (RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes) including but not limited to the figures, and tables (the "Paper") are provided for academic research purposes only and without any warranty. Any commercial use requires my consent. When using any parts of the Software or the Paper in your work, please cite the following paper:
@InProceedings{hui22rmdepth,
author = {Tak-Wai Hui},
title = {RM-Depth: Unsupervised Learning of Recurrent Monocular Depth in Dynamic Scenes},
booktitle = {Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
pages = {1675--1684},
year = {2022}
}