Make-A-Scene
Make-A-Scene copied to clipboard
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Make-A-Scene - PyTorch
Pytorch implementation (unofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Figure 1. from paper
Note: this is work in progress.
We are at training stage! The process can be followed in the Discord-Channel on the LAION Discord https://discord.gg/DghvZDKu. The data preprocessing has been finished as well as training VQSEG. We are currently training VQIMG. Training checkpoints will be released soon with demos. The transformer implementation is in progess and will hopefully be started to train as soon as VQIMG finishes.
Demo
VQIMG: https://colab.research.google.com/drive/1SPyQ-epTsAOAu8BEohUokN4-b5RM_TnE?usp=sharing
Paper Description
Make-A-Scene modifies the VQGAN framework. It makes heavy use of using semantic segmentation maps for extra conditioning. This enables more influence on the generation process. Morever, it also conditions on text. The main improvements are the following:
- Segmentation condition: separate VQVAE is trained (VQ-SEG) + loss modified to a weighted binary cross entropy. (3.4)
- VQGAN training (VQ-IMG) is extended by Face-Loss & Object-Loss (3.3 & 3.5)
- Classifier Guidance for the autoregressive transformer (3.7)
Training Pipeline
Figure 6. from paper
What needs to be done?
Refer to the different folders to see details.
- [X] VQ-SEG
- [X] VQ-IMG
- [ ] Transformer
- [X] Data Aggregation
Citation
@misc{https://doi.org/10.48550/arxiv.2203.13131,
doi = {10.48550/ARXIV.2203.13131},
url = {https://arxiv.org/abs/2203.13131},
author = {Gafni, Oran and Polyak, Adam and Ashual, Oron and Sheynin, Shelly and Parikh, Devi and Taigman, Yaniv},
title = {Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors},
publisher = {arXiv},
year = {2022},
copyright = {arXiv.org perpetual, non-exclusive license}
}