view-fusion
view-fusion copied to clipboard
Official implementation of ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis
ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis
This is the official implementation of "ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis".
@misc{spiegl2024viewfusion,
title={ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis},
author={Bernard Spiegl and Andrea Perin and Stéphane Deny and Alexander Ilin},
year={2024},
eprint={2402.02906},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Setup
Environment
You can install and activate the conda environment by simply running:
conda env create -f environment.yml
conda activate view-fusion
For ARM-based macOS run:
conda env create -f environment_osx.yml
conda activate view-fusion
Dataset
Version of the NMR ShapeNet dataset we use is hosted by (Niemeyer et al.). Downloadable here.
Please note that our current setup is optimized for use in a cluster computing environment and requires sharding.
To shard the dataset, place the NMR_Dataset.zip
in data/nmr/
and run python data/dataset_prep.py
command. The default sharding will split the dataset into four shards. In order to enable parallelization, the number of shards has to be divisible by the number of GPUs you use.
Experiments - Work In Progress!
Configurations for various experiments are located in configs/
.
Training
To launch training on a single GPU run:
python main.py -c configs/small-v100.yaml -g -t --wandb
For a distributed setup run:
torchrun --nnodes=$NUM_NODES --nproc_per_node=$NUM_GPUS main.py -c configs/small-v100-4.yaml -g -t --wandb
where $NUM_NODES
and $NUM_GPUS
can, for instance, be replaced by 1 and 4, respectively. This would correspond to a single-node setup with four V100 GPUs.
(In case you are using Slurm, more example scripts are available in slurm/
.)
Inference
Coming soon.
Eval
Coming soon.
Using Only the Model
In case you want to implement separate data pipelines or training procedures, all the architecture details are available in model/
.
At training time, the model receives:
-
y_0
which is the target (ground truth) of shape(B C H W)
, -
y_cond
which contains all the input views and is of shape(B N C H W)
where N denotes the total number of views (24 in our case), -
view_count
of shape(B,)
which contains the number of views used as conditioning for each sample in the batch, -
angle
also of shape(B,)
indicating the target angle for each sample.
At inference time, y_0
is omitted, with everything else remaining the same as training.
See paper for full implementation details.
Resource Requirements
NB Training configurations require significant amount of VRAM.
The model referenced in the paper was trained using configs/multi-view-composable-variable-small-v100-4.yaml
configuration for 710k steps (approx. 6.5 days) on 4x V100 GPUs, each with 32GB VRAM.
Pretrained model weights will be made available soon.