ProFusion
ProFusion copied to clipboard
Code for Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
ProFusion
ProFusion is a framework for customizing pre-trained large-scale text-to-image generation models, which is Stable Diffusion 2 in our examples.
With ProFusion, you can generate infinite number of creative images for a novel/unique concept, with single testing image, on single GPU (~20GB are needed when fine-tune with batch size 1).
Example
-
Install dependencies (we revised original diffusers);
cd ./diffusers pip install -e . cd .. pip install accelerate==0.16.0 torchvision transformers>=4.25.1 datasets ftfy tensorboard Jinja2 regex tqdm joblib -
Initialize Accelerate;
accelerate config -
Download a model pre-trained on FFHQ;
-
Customize model with a testing image, example is shown in the notebook test.ipynb;
Train Your Own Encoder
If you want to train a PromptNet encoder for other domains, or on your own dataset.
-
First, prepare an image-only dataset;
- In our example, we use FFHQ. Our pre-processed FFHQ can be found at google drive link.
-
Then, run
accelerate launch --mixed_precision="fp16" train.py\ --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-base" \ --train_data_dir=./images_512 \ --max_train_steps=80000 \ --learning_rate=2e-05 \ --output_dir="./promptnet" \ --train_batch_size=8 \ --promptnet_l2_reg=0.000 \ --gradient_checkpointing
Citation
@misc{zhou2023enhancing,
title={Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach},
author={Yufan Zhou and Ruiyi Zhang and Tong Sun and Jinhui Xu},
year={2023},
eprint={2305.13579},
archivePrefix={arXiv},
primaryClass={cs.CV}
}