glow2
glow2 copied to clipboard
Full-Glow: Fully conditional Glow for more realistic image generation
Full-Glow
This repo contains the implementation of Full-Glow: Fully conditional Glow for more realistic image generation: https://arxiv.org/abs/2012.05846. A short presentation of the work could be seen here.
Full-Glow extends on previous Glow-based models for conditional image generation by applying conditioning to all Glow operations using appropriate conditioning networks. It was applied to the Cityscapes dataset (label → photo) for synthesizing street-scene images.
Quantitative results
Full-Glow was evaluated quantitatively against previous Glow-based models (C-Glow and DUAL-Glow) along with the GAN-based model pix2pix using the PSPNet classifier. With each trained model, we did inference on the Cityscapes validation set 3 times and calculated the PSP scores.
Model | Conditional BPD ↓ | Mean pixel acc. ↑ | Mean class acc. ↑ | Mean class IoU ↑ |
---|---|---|---|---|
C-Glow v.1 | 2.568 | 35.02 ± 0.56 | 12.15 ± 0.05 | 7.33 ± 0.09 |
C-Glow v.2 | 2.363 | 52.33 ± 0.46 | 17.37 ± 0.21 | 12.31 ± 0.24 |
Dual-Glow | 2.585 | 71.44 ± 0.03 | 23.91 ± 0.19 | 18.96 ± 0.17 |
pix2pix | --- | 60.56 ± 0.11 | 22.64 ± 0.21 | 16.42 ± 0.06 |
Full-Glow | 2.345 | 73.50 ± 0.13 | 29.13 ± 0.39 | 23.86 ± 0.30 |
Ground-truth | --- | 95.97 | 84.31 | 77.30 |
Visual samples in 512x1024 resolution (please zoom to see more details)
Condition Synthesized
Visual examples of content transfer (please zoom to see more details)
Images from left to right: Desired content - Desired structure - Content applied to structure - Ground-truth for structure
Samples generated on the maps dataset
Top row: Condition, Bottom row: synthesized
Top row: Condition, Bottom row: synthesized
Training
To train a model on e.g. Cityscapes, one can run:
python3 main.py --model improved_so_large_longer --img_size 512 1024 --dataset cityscapes --direction label2photo --n_block 4 --n_flow 8 8 8 8 --do_lu --reg_factor 0.0001 --grad_checkpoint
Arguments
-
--model
indicates the name of the model (should have 'improved' in the name to enable training Full-Glow) -
--dataset
determines which dataset to choose. Dataloaders for the Cityscapes, MNIST, and maps datasets are already implemented here -
--do_lu
enables the use of LU decomposition which has a noticeable effect on training time -
--reg_factor
indicates the regularizer applied to the right-hand side of the objective function -
--grad_checkpoint
enables use of gradient checkpointing which is needed here for training on larger images
Description of packages in the project
-
data_handler
contains implementation of data loaders for different datasets -
evaluation
contains code for evaluating the models -
experiments
has code for experiments such as content transfer and sampling -
helper
contains implementation of helper functions for dealing with files, directories, saving/loading checkpoints etc. -
models
contains implementation of Full-Glow, DUAL-Glow, and C-Glow -
trainer
has implementation of the training loop and loss function
Checkpoints
Checkpoints for all the Cityscapes models trained in this project (including C-Glow and DUAL-Glow) can be found here: https://kth.box.com/s/h3r9jt5pq8itrnkp0t2qy11pui7u6dmc
Notes
- My implementation of the baseline Glow borrows heavily from Kim Seonghyeon's helpful implementation: https://github.com/rosinality/glow-pytorch
Citation
If you use our code or build on our method, please cite our paper:
@inproceedings{sorkhei2021full,
author={Sorkhei, Moein and Henter, Gustav Eje and Kjellstr{\"o}m, Hedvig},
title={Full-{G}low: {F}ully conditional {G}low for more realistic image generation},
booktitle={Proceedings of the DAGM German Conference on Pattern Recognition (GCPR)},
volume={43},
month={Oct.},
year={2021}
}