E-LatentLPIPS
E-LatentLPIPS copied to clipboard
Unofficial Implementation of E-LatentLPIPS(Ensembled-LatentLPIPS) of Diffusion2GAN
Unofficial Implementation E-LatentLPIPS(Ensembled-LatentLPIPS) of Diffusion2GAN
data:image/s3,"s3://crabby-images/d168c/d168ca969fe5034957592383f98c48ead2f42b0a" alt=""
News
- [24/07/11] release pretrained latent vgg network weights for E-Latent LPIPS
- [24/06/19] release Ensemble hyperparameter for E-Latent LPIPS
- [24/06/18] code release
Comparison with Orginal Works
Original Results
data:image/s3,"s3://crabby-images/89cbd/89cbdc8cec68f4fb487ea7e93e2334a1419b0da7" alt=""
My Results
data:image/s3,"s3://crabby-images/da02e/da02e88fc0083bb7222de8a44630897f593ce453" alt=""
It was implemented according to the contents of the paper, and the undisclosed details were implemented by referring to or speculating on the contents of the existing paper. The performance is lower than that of the paper, which is thought to be due to differences in the model architecture part. (Remove Maxpool layer position or 4th Channel scale/shift parameter)
Result Images by Augmentations Option
data:image/s3,"s3://crabby-images/8b5f5/8b5f55787a4cc279e2f25982d5858744d298eb64" alt=""
data:image/s3,"s3://crabby-images/f9eb0/f9eb05087ec65217390e42e8066a55b174b94a7c" alt=""
I've reproduce it and the actual effect of color augmentation is thought to be insignificant In the paper, Origianl Paper didn't use noise augmentation, but noise has a huge impact on the actual image recon quality In the previous augmentation used in StyleGAN-ADA, augmentation such as luma flip also exists, but when added, it affected the reconstruction color and was excluded
data:image/s3,"s3://crabby-images/fca05/fca0596b6d5782b885752a1805957e0a96bd55bb" alt=""
The smaller the learning rate, the better the details of the image are reconstructed and the loss value is reduced, but not directly proportional to the PSNR
data:image/s3,"s3://crabby-images/5a67e/5a67e2f9064262d147a1a43c3de41009642739dd" alt=""
Result Images by Augmentations Option Implemented by torchvision.transform
data:image/s3,"s3://crabby-images/b004c/b004cb1147010dbd92b7115bf7e078e1aecea5e7" alt=""
The augmentation option used in StyleGAN-ADA was used as it was, but the reconstruction was not successful Find an augmentation option that can reconstruct even detail features through Hyperparameter Optimization(4090 GPU 400 Hour+)
How to use
Install
pip install -r requirements.txt
ninja is a package that is installed for pytorch extension upfirdn2d_plugin
, and you can install it optionally. There
is no problem with code execution even if you do not install it, but warning can appear
Data Preparation
# BAPPS datset download
bash scripts/download_dataset.sh
# encode BAPPS dataset with runwayml/stable-diffusion-v1-5
# if you want to encode other VAE change huggingface url
python utils/make_latent_dataset_2afc.py --input_dir dataset/2afc --output_dir dataset/latent_2afc --batch_size 4
If you only want to download valset, run scripts/download_dataset_valonly.sh
instead
of bash scripts/download_dataset.sh
These scripts are from the LPIPS repository, and so is the BAPPS dataset. See the repository for licenses related to
BAPPS dataset usage(https://github.com/richzhang/PerceptualSimilarity/tree/master)
By default, make_latent_dataset_2afc.py
encode as runwayml/stable-diffusion-v1-5
. If you want to use another VAE or
encoder, change the huggingface url
checkpoints
you can download pretrained latent vgg netwotk this link (https://drive.google.com/file/d/1558700cub2hjAv-fXcyUGJUJBTrm5m3g/view?usp=sharing)
Latent LPIPS Train
# train LPIPS
python train.py --dataset_mode 2afc
# train LatentLPIPS
python train.py --dataset_mode latent_2afc --latent_mode True
Latent LPIPS test
# LPIPS
python test.py --model_path checkpoints/LatentLPIPS.ckpt --dataset_mode 2afc
LatentLPIPS
python test.py --model_path checkpoints/LatentLPIPS.ckpt --dataset_mode latent_2afc --latent_mode True
Single Reconstruction Experiment
# load pretrained original LPIPS (https://github.com/richzhang/PerceptualSimilarity/tree/master)
python single_reconstruction.py --reconstruction_target single_reconstruction_sample.jpeg
# LPIPS
python single_reconstruction.py --reconstruction_target single_reconstruction_sample.jpeg
# LatentLPIPS
python single_reconstruction.py --reconstruction_target single_reconstruction_sample.jpeg --latent_mode
# E-LatentLPIPS
python single_reconstruction.py --reconstruction_target single_reconstruction_sample.jpeg --latent_mode --ensemble_mode
# Optimal option for E-LatentLPIPS
python single_reconstruction.py --reconstruction_target single_reconstruction_sample.jpeg --latent_mode --ensemble_mode --xflip True --rotate90 True --xint True --xfrac True --scale True --rotate True --aniso True --brightness True --contrast True --saturation True --cutout True --noise True
Citation
@article{kang2024diffusion2gan,
author = {Kang, Minguk and Zhang, Richard and Barnes, Connelly and Paris, Sylvain and Kwak, Suha and Park, Jaesik and Shechtman, Eli and Zhu, Jun-Yan and Park, Taesung},
title = {{Distilling Diffusion Models into Conditional GANs}},
journal = {arXiv preprint arXiv:2405.05967},
year = {2024},
}
@misc{2006.06676,
Author = {Tero Karras and Miika Aittala and Janne Hellsten and Samuli Laine and Jaakko Lehtinen and Timo Aila},
Title = {Training Generative Adversarial Networks with Limited Data},
Year = {2020},
Eprint = {arXiv:2006.06676},
@inproceedings{zhang2018perceptual,
title={The Unreasonable Effectiveness of Deep Features as a Perceptual Metric},
author={Zhang, Richard and Isola, Phillip and Efros, Alexei A and Shechtman, Eli and Wang, Oliver},
booktitle={CVPR},
year={2018}
}
@misc{kettunen2019elpips,
title={E-LPIPS: Robust Perceptual Image Similarity via Random Transformation Ensembles},
author={Markus Kettunen and Erik Härkönen and Jaakko Lehtinen},
year={2019},
eprint={1906.03973},
archivePrefix={arXiv},
primaryClass={id='cs.CV' full_name='Computer Vision and Pattern Recognition' is_active=True alt_name=None in_archive='cs' is_general=False description='Covers image processing, computer vision, pattern recognition, and scene understanding. Roughly includes material in ACM Subject Classes I.2.10, I.4, and I.5.'}
}