diffusers
diffusers copied to clipboard
FID on CIFAR10 does not match literature
Describe the bug
I used the unconditional training script from examples to train a model on CIFAR10, using default parameters (except using 1024 steps). After around 250k updates (batch size 128, 4 GPUs), the samples look decent but when doing sampling with DDIM with 128 steps and eta=0, the sampled images give a FID of 17.0, which is worse than the number reported in literature (for example, around 3-4 in https://arxiv.org/pdf/2202.00512v2.pdf for the same DDIM settings, or around 4 in the DDIM paper).
I don't necessarily suspect something is wrong with the code but nevertheless, I think we need to be able to reproduce CIFAR10 numbers.
Reproduction
- train a model using
examples/unconditional_image_generation/train_unconditional.py --dataset_name "cifar10" --output_dir="ddpm-ema-cifar10" --train_batch_size=32 --num_epochs=500 --learning_rate=1e-4 --lr_warmup_steps=500 --predict_mode x0 --ddpm_num_steps 1024
- sample a collection of images from the trained pipeline by using
examples/unconditional_image_generation/sample_unconditional.py --load_dir ddpm-ema-cifar10/ --predict_mode eps --gpu 1 --num_samples 60000 --output_subdir samples_ddim_128 --sampler ddim --num_steps 128
- compute FID by
fidelity --gpu 1 --fid --input1 ddpm-ema-cifar10-eps-halfway/samples_ddim_128/imgs/ --input2 original_imgs/cifar10/all/
from theexamples/unconditional_image_generation/
directory, after storing all CIFAR10 images.
Note: reproduction may require my fork where I implemented the sample_unconditional.py
script and made a few changes to some other classes (PR#1126).
Logs
No response
System Info
N/A
Note that we never verified that training gives the same results. We made sure that the denoising process pass matches 1-to-1 however it's sadly too time-consuming to test a whole training run. There could be many reasons here. Some possible reasons:
- Our dropout layers are slightly different
- Hyper-parameters for training don't match 1-to-1
- ...
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Note that we never verified that training gives the same results. We made sure that the denoising process pass matches 1-to-1 however it's sadly too time-consuming to test a whole training run. There could be many reasons here. Some possible reasons:
* Our dropout layers are slightly different * Hyper-parameters for training don't match 1-to-1 * ...
@patrickvonplaten thank you for the reply! I almost managed to reproduce the CIFAR10 numbers in my fork, getting a FID of around 3.9 (literature reports numbers around 3.2). I changed several things, like model, hyperparams etc. I also have a better FID for Flowers than the one in the main branch, getting to a FID of around 10.
Very cool! Feel free to post the results here :-)
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Note that we never verified that training gives the same results. We made sure that the denoising process pass matches 1-to-1 however it's sadly too time-consuming to test a whole training run. There could be many reasons here. Some possible reasons:
* Our dropout layers are slightly different * Hyper-parameters for training don't match 1-to-1 * ...
@patrickvonplaten thank you for the reply! I almost managed to reproduce the CIFAR10 numbers in my fork, getting a FID of around 3.9 (literature reports numbers around 3.2). I changed several things, like model, hyperparams etc. I also have a better FID for Flowers than the one in the main branch, getting to a FID of around 10.
Hi @lukovnikov, thank you for the great experiment on CIFAR10. Can you share more details about your changes or specify a commit that can reproduce your FID? I am following the mine branch in your fork and can get FID of around 8 after training for 500k iteration. Not sure whether everything is correct or there is something wrong in the codebase.
Hi @pkuanjie,
It's been a while so I forgot how exactly I got there but I'll try to look at it when I have free time.
Denis
On Mon, Feb 27, 2023 at 3:40 AM pkuanjie @.***> wrote:
Note that we never verified that training gives the same results. We made sure that the denoising process pass matches 1-to-1 however it's sadly too time-consuming to test a whole training run. There could be many reasons here. Some possible reasons:
Our dropout layers are slightly different
Hyper-parameters for training don't match 1-to-1
...
@patrickvonplaten https://github.com/patrickvonplaten thank you for the reply! I almost managed to reproduce the CIFAR10 numbers in my fork, getting a FID of around 3.9 (literature reports numbers around 3.2). I changed several things, like model, hyperparams etc. I also have a better FID for Flowers than the one in the main branch, getting to a FID of around 10.
Hi @lukovnikov https://github.com/lukovnikov, thank you for the great experiment on CIFAR10. Can you share more details about your changes or specify a commit that can reproduce your FID? I am following the mine branch in your fork and can get FID of around 8 after training for 500k iteration. Not sure whether everything is correct or there is something wrong in the codebase.
— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/1220#issuecomment-1445596905, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHCLXVCXN5ELRUP5FTQBTWZQH2TANCNFSM6AAAAAAR3UBJ2U . You are receiving this because you were mentioned.Message ID: @.***>
Hi @lukovnikov ,
I got similar results, which is 10 FID and 8 IS scores. But I can't obtain 4 fid. Could you please share some information to reproduce the results? I trained the model on 4*3090.
Some hyperparameters are as follows:
accelerate launch train_unconditional.py
--dataset_name="cifar10"
--resolution=32 --center_crop --random_flip
--output_dir="/data/tsk/ddpm-ema-cifar10-32"
--train_batch_size=128
--num_epochs=10000
--gradient_accumulation_steps=1
--use_ema
--learning_rate=1e-4
--lr_warmup_steps=500
--mixed_precision=no \
Hi, I tried finding out what I did but the runs are somewhere in the backups of an older server and I'm not working on this rn anymore. Without running it myself, I can't be certain which commit and hyperparams did it, but one thing I remember for sure is that I changed the model a bit, probably like here: https://github.com/lukovnikov/diffusers/blob/mine/examples/unconditional_image_generation/train_unconditional.py#L282
Hi, I tried finding out what I did but the runs are somewhere in the backups of an older server and I'm not working on this rn anymore. Without running it myself, I can't be certain which commit and hyperparams did it, but one thing I remember for sure is that I changed the model a bit, probably like here: https://github.com/lukovnikov/diffusers/blob/mine/examples/unconditional_image_generation/train_unconditional.py#L282
Thanks a lot! I will try this settings later. It seems that you reduced the total parameter of models. If this model can reproduce the results in the paper, then there might be some problems related to overfitting. Moreover, in the original paper, they use lr of 2e-4 instead of 1e-5. I will try this settings with your model as well. I will attach the results I obtain later.
Shengkun
Hi, I tried finding out what I did but the runs are somewhere in the backups of an older server and I'm not working on this rn anymore. Without running it myself, I can't be certain which commit and hyperparams did it, but one thing I remember for sure is that I changed the model a bit, probably like here: https://github.com/lukovnikov/diffusers/blob/mine/examples/unconditional_image_generation/train_unconditional.py#L282
Hi, I have tried your model settings. I got 6.3 FID which is better than before. However, it's still worse than the number in the original paper. I didn't apply dropout since I don't know how to set it. Does that matter? Could you tell me the settings of dropout rate? I will appreciate if you could tell me other hyperparameter settings!
Best, Shengkun
Hi, @Tangshengku
Have you reproduced the 3.2 FID? Could you share your hyperparameters?
BTW, maybe you can set the dropout this way? where the config.json is from https://huggingface.co/google/ddpm-cifar10-32/blob/main/config.json
config = UNet2DModel.load_config(args.model_config_name_or_path)
config['resnet_time_scale_shift'] = 'scale_shift'
model = UNet2DModel.from_config(config)
for n, m in model.named_modules():
if isinstance(m, torch.nn.Dropout):
m.p = 0.1
hi Xiaosen,
Unfortunately, I didn't obtain the results in the original paper. However, utilizing the modification mentioned above did bring benefits. Hope you can find out!
Shengkun
---Original--- From: "Xiaosen @.> Date: Sat, Jul 15, 2023 10:49 AM To: @.>; Cc: "Shengkun @.@.>; Subject: Re: [huggingface/diffusers] FID on CIFAR10 does not match literature(Issue #1220)
Hi, @Tangshengku
Have you reproduced the 3.2 FID? Could you share your hyperparameters?
BTW, maybe you can set the dropout this way? where the config.json is from
https://huggingface.co/google/ddpm-cifar10-32/blob/main/config.json
config = UNet2DModel.load_config(args.model_config_name_or_path) model = UNet2DModel.from_config(config) for n, m in model.named_modules(): if isinstance(m, torch.nn.Dropout): m.p = 0.1
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: @.***>