mvdfusion
mvdfusion copied to clipboard
[CVPR 2024] MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
MVD-Fusion
MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu*,
Zhizhuo Zhou*,
Varun Jampani,
Shubham Tulsiani
*Equal contribution
CVPR 2024 | GitHub | arXiv | Project page
Given an input RGB image, MVD-Fusion generates multi-view RGB-D images using a depth-guided attention mechanism for enforcing multi-view consistency. We visualize the input RGB image (left) and three synthesized novel views (with generated depth in inset).
Shoutouts and Credits
This project is built on top of open-source code. We thank the open-source research community and credit our use of parts of Stable Diffusion, kiuikit, Zero-1-to-3, and Syncdreamer.
Code
Colab demo!
Our code release contains:
- Code for inference
- Code for training (Coming Soon!)
- Pretrained weights on Objaverse
For bugs and issues, please open an issue on GitHub and I will try to address it promptly.
Environment Setup
Please follow the environment setup guide in ENVIRONMENT.md.
Dataset
We provide two evaluation dataset, Google Scanned Objects (GSO) and SyncDreamer in-the-wild dataset.
- (optional) Download GSO evaluation set here and extract it to
demo_datasets/gso_eval
. - (optoinal) Download in-the-wild evaluation set here and extract it to
demo_datasets/wild_eval
.
Pretrained Weights
MVD-Fusion requires Zero-1-to-3 weights, CLIP ViT weights, and finetuned MVD-Fusion weights.
- Find MVD-Fusion weights here and download them to
weights/
, a full set of weights will haveweights/clip_vit_14.ckpt
,weights/mvdfusion_sep23.pt
, andweights/zero123_105000_cc.ckpt
.
Evaluation
Examples
To run evaluation on the GSO test set, assuming the dataset and model weights are downloaded according to instructions above, run demo.py
.
$ python demo.py -c configs/mvd_gso.yaml
Flags
-g, --gpus number of gpus to use (default: 1)
-p, --port last digit of DDP port (default: 1)
-c, --config yaml config file
Output
Output artifact will be saved to demo/
by default.
Training
- Zero123 weights are required for training (for initialization). Please download them and extract them to
weights/zero123_105000.ckpt
.
Sample training code is provided in train.py
. Please follow the evaluation tutorial above to setup the environment and pretrained weights. It is recommended to directly modify configs/mvd_train.yaml
to specify the experiment directory and set the training hyperparameters. We show training flags below. We recommend a minimum of 4 GPUs for training.
$ python demo.py -c configs/mvd_train.yaml -g 4
Flags
-g, --gpus number of gpus to use (default: 1)
-p, --port last digit of DDP port (default: 1)
-b, --backend distributed data parallel backend (default: nccl)
Using Custom Datasets
To train on a custom dataset, one needs to write a custom dataloader. We describe the required outputs for the __getitem__
function, which should be a dictionary containing:
{
'images': (B, 3, H, W) image tensor,
'R': (B, 3, 3) PyTorch3D rotation,
'T': (B, 3) PyTorch3D translation,
'f': (B, 2) PyTorch3D focal_length in NDC space,
'c': (B, 2) PyTorch3D principal_point in NDC space,
}
Citation
If you find this work useful, a citation will be appreciated via:
@inproceedings{hu2024mvdfusion,
title={MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation},
author={Hanzhe Hu and Zhizhuo Zhou and Varun Jampani and Shubham Tulsiani},
booktitle={CVPR},
year={2024}
}
Acknowledgements
We thank Bharath Raj, Jason Y. Zhang, Yufei (Judy) Ye, Yanbo Xu, and Zifan Shi for helpful discussions and feedback. This work is supported in part by NSF GRFP Grant No. (DGE1745016, DGE2140739).