TransMVSNet
TransMVSNet copied to clipboard
(CVPR 2022) TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers.
(CVPR2022) TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers
Paper | Project Page | Arxiv | Models
π Introduction
In this paper, we present TransMVSNet, based on our exploration of feature matching in multi-view stereo (MVS). We analogize MVS back to its nature of a feature matching task and therefore propose a powerful Feature Matching Transformer (FMT) to leverage intra- (self-) and inter- (cross-) attention to aggregate long-range context information within and across images. To facilitate a better adaptation of the FMT, we leverage an Adaptive Receptive Field (ARF) module to ensure a smooth transit in scopes of features and bridge different stages with a feature pathway to pass transformed features and gradients across different scales. In addition, we apply pair-wise feature correlation to measure similarity between features, and adopt ambiguity-reducing focal loss to strengthen the supervision. To the best of our knowledge, TransMVSNet is the first attempt to leverage Transformer into the task of MVS. As a result, our method achieves state-of-the-art performance on DTU dataset, Tanks and Temples benchmark, and BlendedMVS dataset.
π§ Installation
Our code is tested with Python==3.6/3.7/3.8, PyTorch==1.6.0/1.7.0/1.9.0, CUDA==10.2 on Ubuntu-18.04 with NVIDIA GeForce RTX 2080Ti. Similar or higher version should work well.
To use TransMVSNet, clone this repo:
git clone https://github.com/MegviiRobot/TransMVSNet.git
cd TransMVSNet
We highly recommend using Anaconda to manage the python environment:
conda create -n transmvsnet python=3.6
conda activate transmvsnet
pip install -r requirements.txt
We also recommend using apex, you can install apex from the official repo.
π¦ Data preparation
In TransMVSNet, we mainly use DTU, BlendedMVS and Tanks and Templs to train and evaluate our models. You can prepare the corresponding data by following the below instruction.
β DTU
For DTU training set, you can download the preprocessed DTU training data and Depths_raw (both from Original MVSNet), and unzip them to construct a dataset folder like:
dtu_training
βββ Cameras
βββ Depths
βββ Depths_raw
βββ Rectified
For DTU testing set, you can download the preprocessed DTU testing data (from Original MVSNet) and unzip it as the test data folder, which should contain one cams
folder, one images
folder and one pair.txt
file.
β BlendedMVS
We use the low-res set of BlendedMVS dataset for both training and testing. You can download the low-res set from orignal BlendedMVS and unzip it to form the dataset folder like below:
BlendedMVS
βββ 5a0271884e62597cdee0d0eb
β βββ blended_images
β βββ cams
β βββ rendered_depth_maps
βββ 59338e76772c3e6384afbb15
βββ 59f363a8b45be22330016cad
βββ ...
βββ all_list.txt
βββ training_list.txt
βββ validation_list.txt
β Tanks and Temples
Download our preprocessed Tanks and Temples dataset and unzip it to form the dataset folder like below:
tankandtemples
βββ advanced
β βββ Auditorium
β βββ Ballroom
β βββ ...
β βββ Temple
βββ intermediate
βββ Family
βββ Francis
βββ ...
βββ Train
π Training
β Training on DTU
Set the configuration in scripts/train.sh
:
- Set
MVS_TRAINING
as the path of DTU training set. - Set
LOG_DIR
to save the checkpoints. - Change
NGPUS
to suit your device. - We defaultly and recommend using
torch.distributed.launch
.
To train your own model, just run:
bash scripts/train.sh
You can conveniently modify more hyper-parameters in scripts/train.sh
according to the argparse in train.py
, such as summary_freq
, save_freq
, and so on.
β Finetune on BlendedMVS
For a fair comparison with other SOTA methods on Tanks and Temples benchmark, we finetune our model on BlendedMVS dataset after training on DTU dataset.
Set the configuration in scripts/train_bld_fintune.sh
:
- Set
MVS_TRAINING
as the path of BlendedMVS dataset. - Set
LOG_DIR
to save the checkpoints and training log. - Set
CKPT
as path of the loaded.ckpt
which is trained on DTU dataset.
To finetune your own model, just run:
bash scripts/train_bld_fintune.sh
π Testing
For easy testing, you can download our pre-trained models and put them in checkpoints
folder, or use your own models and follow the instruction below.
β Testing on DTU
Set the configuration in scripts/test_dtu.sh
:
- Set
TESTPATH
as the path of DTU testing set. - Set
TESTLIST
as the path of test list (.txt file). - Set
CKPT_FILE
as the path of the model weights. - Set
OUTDIR
as the path to save results.
Run:
bash scripts/test_dtu.sh
Use the normal
fusion method to fuse point cloud results. But you can also set to use gipuma
fusion method to fuse the point clouds. The instruction for installing and compiling gipuma
can be found here.
For quantitative evaluation on DTU dataset, download SampleSet and Points. Unzip them and place Points
folder in SampleSet/MVS Data/
. The structure looks like:
SampleSet
βββMVS Data
βββPoints
In DTU-MATLAB/BaseEvalMain_web.m
, set dataPath
as path to SampleSet/MVS Data/
, plyPath
as directory that stores the reconstructed point clouds and resultsPath
as directory to store the evaluation results. Then run DTU-MATLAB/BaseEvalMain_web.m
in matlab.
We also upload our final point cloud results to here. You can easily download them and evaluate them using the MATLAB
scripts, the results look like:
Acc. (mm) | Comp. (mm) | Overall (mm) |
---|---|---|
0.321 | 0.289 | 0.305 |
β Testing on Tanks and Temples
We recommend using the finetuned models to test on Tanks and Temples benchmark.
Similarly, set the configuration in scripts/test_tnt.sh
:
- Set
TESTPATH
as the path of intermediate set or advanced set. - Set
TESTLIST
as the path of test list (.txt file). - Set
CKPT_FILE
as the path of the model weights. - Set
OUTDIR
as the path to save resutls.
To generate point cloud results, just run:
bash scripts/test_tnt.sh
Note thatοΌ
- The parameters of point cloud fusion have not been studied thoroughly and the performance can be better if cherry-picking more appropriate thresholds for each of the scenes.
- The dynamic fusion code is borrowed from AA-RMVSNet.
For quantitative evaluation, you can upload your point clouds to Tanks and Temples benchmark.
π Citation
@inproceedings{ding2022transmvsnet,
title={Transmvsnet: Global context-aware multi-view stereo network with transformers},
author={Ding, Yikang and Yuan, Wentao and Zhu, Qingtian and Zhang, Haotian and Liu, Xiangyue and Wang, Yuanjiang and Liu, Xiao},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={8585--8594},
year={2022}
}
π Acknowledgments
We borrow some code from CasMVSNet, LoFTR and AA-RMVSNet. We thank the authors for releasing the source code.