diffusers FID on CIFAR10 does not match literature

Describe the bug

I used the unconditional training script from examples to train a model on CIFAR10, using default parameters (except using 1024 steps). After around 250k updates (batch size 128, 4 GPUs), the samples look decent but when doing sampling with DDIM with 128 steps and eta=0, the sampled images give a FID of 17.0, which is worse than the number reported in literature (for example, around 3-4 in https://arxiv.org/pdf/2202.00512v2.pdf for the same DDIM settings, or around 4 in the DDIM paper).

I don't necessarily suspect something is wrong with the code but nevertheless, I think we need to be able to reproduce CIFAR10 numbers.

Reproduction

train a model using examples/unconditional_image_generation/train_unconditional.py --dataset_name "cifar10" --output_dir="ddpm-ema-cifar10" --train_batch_size=32 --num_epochs=500 --learning_rate=1e-4 --lr_warmup_steps=500 --predict_mode x0 --ddpm_num_steps 1024
sample a collection of images from the trained pipeline by using examples/unconditional_image_generation/sample_unconditional.py --load_dir ddpm-ema-cifar10/ --predict_mode eps --gpu 1 --num_samples 60000 --output_subdir samples_ddim_128 --sampler ddim --num_steps 128
compute FID by fidelity --gpu 1 --fid --input1 ddpm-ema-cifar10-eps-halfway/samples_ddim_128/imgs/ --input2 original_imgs/cifar10/all/ from the examples/unconditional_image_generation/ directory, after storing all CIFAR10 images.

Note: reproduction may require my fork where I implemented the sample_unconditional.py script and made a few changes to some other classes (PR#1126).

Logs

No response

System Info

N/A

Nov 09 '22 16:11 lukovnikov

Note that we never verified that training gives the same results. We made sure that the denoising process pass matches 1-to-1 however it's sadly too time-consuming to test a whole training run. There could be many reasons here. Some possible reasons:

Our dropout layers are slightly different
Hyper-parameters for training don't match 1-to-1
...

Nov 09 '22 21:11 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Dec 10 '22 15:12 github-actions[bot]

Note that we never verified that training gives the same results. We made sure that the denoising process pass matches 1-to-1 however it's sadly too time-consuming to test a whole training run. There could be many reasons here. Some possible reasons:
* Our dropout layers are slightly different

* Hyper-parameters for training don't match 1-to-1

* ...

@patrickvonplaten thank you for the reply! I almost managed to reproduce the CIFAR10 numbers in my fork, getting a FID of around 3.9 (literature reports numbers around 3.2). I changed several things, like model, hyperparams etc. I also have a better FID for Flowers than the one in the main branch, getting to a FID of around 10.

Dec 14 '22 14:12 lukovnikov

Very cool! Feel free to post the results here :-)

Dec 19 '22 12:12 patrickvonplaten

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Jan 12 '23 15:01 github-actions[bot]

Note that we never verified that training gives the same results. We made sure that the denoising process pass matches 1-to-1 however it's sadly too time-consuming to test a whole training run. There could be many reasons here. Some possible reasons:
* Our dropout layers are slightly different

* Hyper-parameters for training don't match 1-to-1

* ...
@patrickvonplaten thank you for the reply! I almost managed to reproduce the CIFAR10 numbers in my fork, getting a FID of around 3.9 (literature reports numbers around 3.2). I changed several things, like model, hyperparams etc. I also have a better FID for Flowers than the one in the main branch, getting to a FID of around 10.

Hi @lukovnikov, thank you for the great experiment on CIFAR10. Can you share more details about your changes or specify a commit that can reproduce your FID? I am following the mine branch in your fork and can get FID of around 8 after training for 500k iteration. Not sure whether everything is correct or there is something wrong in the codebase.

Feb 27 '23 02:02 pkuanjie

Hi @pkuanjie,

It's been a while so I forgot how exactly I got there but I'll try to look at it when I have free time.

Denis

On Mon, Feb 27, 2023 at 3:40 AM pkuanjie @.***> wrote:

Note that we never verified that training gives the same results. We made sure that the denoising process pass matches 1-to-1 however it's sadly too time-consuming to test a whole training run. There could be many reasons here. Some possible reasons:

Our dropout layers are slightly different

Hyper-parameters for training don't match 1-to-1

...

@patrickvonplaten https://github.com/patrickvonplaten thank you for the reply! I almost managed to reproduce the CIFAR10 numbers in my fork, getting a FID of around 3.9 (literature reports numbers around 3.2). I changed several things, like model, hyperparams etc. I also have a better FID for Flowers than the one in the main branch, getting to a FID of around 10.

Hi @lukovnikov https://github.com/lukovnikov, thank you for the great experiment on CIFAR10. Can you share more details about your changes or specify a commit that can reproduce your FID? I am following the mine branch in your fork and can get FID of around 8 after training for 500k iteration. Not sure whether everything is correct or there is something wrong in the codebase.

— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/1220#issuecomment-1445596905, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANHCLXVCXN5ELRUP5FTQBTWZQH2TANCNFSM6AAAAAAR3UBJ2U . You are receiving this because you were mentioned.Message ID: @.***>

Feb 28 '23 11:02 lukovnikov

Hi @lukovnikov ,

I got similar results, which is 10 FID and 8 IS scores. But I can't obtain 4 fid. Could you please share some information to reproduce the results? I trained the model on 4*3090. Some hyperparameters are as follows: accelerate launch train_unconditional.py
--dataset_name="cifar10"
--resolution=32 --center_crop --random_flip
--output_dir="/data/tsk/ddpm-ema-cifar10-32"
--train_batch_size=128
--num_epochs=10000
--gradient_accumulation_steps=1
--use_ema
--learning_rate=1e-4
--lr_warmup_steps=500
--mixed_precision=no \

Mar 16 '23 02:03 Tangshengku

Hi, I tried finding out what I did but the runs are somewhere in the backups of an older server and I'm not working on this rn anymore. Without running it myself, I can't be certain which commit and hyperparams did it, but one thing I remember for sure is that I changed the model a bit, probably like here: https://github.com/lukovnikov/diffusers/blob/mine/examples/unconditional_image_generation/train_unconditional.py#L282

Mar 17 '23 11:03 lukovnikov

Hi, I tried finding out what I did but the runs are somewhere in the backups of an older server and I'm not working on this rn anymore. Without running it myself, I can't be certain which commit and hyperparams did it, but one thing I remember for sure is that I changed the model a bit, probably like here: https://github.com/lukovnikov/diffusers/blob/mine/examples/unconditional_image_generation/train_unconditional.py#L282

Thanks a lot! I will try this settings later. It seems that you reduced the total parameter of models. If this model can reproduce the results in the paper, then there might be some problems related to overfitting. Moreover, in the original paper, they use lr of 2e-4 instead of 1e-5. I will try this settings with your model as well. I will attach the results I obtain later.

Shengkun

Mar 17 '23 12:03 Tangshengku

Hi, I tried finding out what I did but the runs are somewhere in the backups of an older server and I'm not working on this rn anymore. Without running it myself, I can't be certain which commit and hyperparams did it, but one thing I remember for sure is that I changed the model a bit, probably like here: https://github.com/lukovnikov/diffusers/blob/mine/examples/unconditional_image_generation/train_unconditional.py#L282

Hi, I have tried your model settings. I got 6.3 FID which is better than before. However, it's still worse than the number in the original paper. I didn't apply dropout since I don't know how to set it. Does that matter? Could you tell me the settings of dropout rate? I will appreciate if you could tell me other hyperparameter settings!

Best, Shengkun

Mar 20 '23 09:03 Tangshengku

Hi, @Tangshengku

Have you reproduced the 3.2 FID? Could you share your hyperparameters?

BTW, maybe you can set the dropout this way? where the config.json is from https://huggingface.co/google/ddpm-cifar10-32/blob/main/config.json

        config = UNet2DModel.load_config(args.model_config_name_or_path)
        config['resnet_time_scale_shift'] = 'scale_shift'
        model = UNet2DModel.from_config(config)
        for n, m in model.named_modules():
            if isinstance(m, torch.nn.Dropout):
                m.p = 0.1

Jul 15 '23 02:07 xszheng2020

hi Xiaosen,

Unfortunately, I didn't obtain the results in the original paper. However, utilizing the modification mentioned above did bring benefits. Hope you can find out!

Shengkun

---Original--- From: "Xiaosen @.> Date: Sat, Jul 15, 2023 10:49 AM To: @.>; Cc: "Shengkun @.@.>; Subject: Re: [huggingface/diffusers] FID on CIFAR10 does not match literature(Issue #1220)

Hi, @Tangshengku

Have you reproduced the 3.2 FID? Could you share your hyperparameters?

BTW, maybe you can set the dropout this way? where the config.json is from https://huggingface.co/google/ddpm-cifar10-32/blob/main/config.json config = UNet2DModel.load_config(args.model_config_name_or_path) model = UNet2DModel.from_config(config) for n, m in model.named_modules(): if isinstance(m, torch.nn.Dropout): m.p = 0.1
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Jul 18 '23 10:07 Tangshengku

diffusers diffusers copied to clipboard

FID on CIFAR10 does not match literature

Describe the bug

Reproduction

Logs

System Info

diffusers
diffusers copied to clipboard