latent-diffusion icon indicating copy to clipboard operation
latent-diffusion copied to clipboard

Question about --scale_lr

Open ader47 opened this issue 1 year ago • 20 comments

Hi, I encountered some problems when I train the unconditional LDM. I trained the LDM with 2 RTX 3090. When should I use --scale_lr True to scale the learning rate? (Actually, it's True by default....) The learning rate is scaled by accumulate_grad_batches * ngpu * bs * base_lr Why should scaled the learning rate by this? If I use batch size 48, the learning rate will become 1*2*48*0.00005, much bigger than the lr in the Paper( 0.00005). and the model won't be converged. I want to train the model with the paper settings, should I set --scaled False ?

ader47 avatar Apr 02 '23 18:04 ader47

i have faced the same problem, and i found that the model converges well in my task while scaled is False

Joel18241096 avatar Apr 03 '23 08:04 Joel18241096

why do you need a batch size as big as 48? I don't think rtx3090 has enough memory.

blusque avatar Apr 04 '23 12:04 blusque

why do you need a batch size as big as 48? I don't think rtx3090 has enough memory.

I want to train on the LSUN_chruches dataset , and the batch size in the original paper is 96. The max batch size I tested on RTX 3090 is 52.

ader47 avatar Apr 04 '23 12:04 ader47

Hi, I encountered some problems when I train the unconditional LDM. I trained the LDM with 2 RTX 3090. When should I use --scale_lr True to scale the learning rate? (Actually, it's True by default....) The learning rate is scaled by accumulate_grad_batches * ngpu * bs * base_lr Why should scaled the learning rate by this? If I use batch size 48, the learning rate will become 1*2*48*0.00005, much bigger than the lr in the Paper( 0.00005). and the model won't be converged. I want to train the model with the paper settings, should I set --scaled False ?

Got the same question

haooxia avatar Nov 04 '23 02:11 haooxia

I also encountered a situation where the model was unable to converge. I kept the learning rate constant at 5e-5 and it seemed that it could not converge.

clearlyzero avatar Dec 09 '23 15:12 clearlyzero

I also encountered a situation where the model was unable to converge. I kept the learning rate constant at 5e-5 and it seemed that it could not converge.

The loss will be fluctuated around about 0.2

ader47 avatar Dec 09 '23 15:12 ader47

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右波动

In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?

clearlyzero avatar Dec 09 '23 15:12 clearlyzero

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右波动

In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?

You kept the provided settings or you changed the settings?I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False,others can be True.

ader47 avatar Dec 09 '23 15:12 ader47

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右波动

In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?

You kept the provided settings or you changed the settings?I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False,others can be True.

If set to FALSE the lr = n_gpus*0.00005?

clearlyzero avatar Dec 09 '23 15:12 clearlyzero

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右波动

In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?

You kept the provided settings or you changed the settings?I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False,others can be True.

If set to FALSE the lr = n_gpus*0.00005?

No, the lr=0.00005 and I remember the code sets a lr linear scheduler, so the lr will increase from 0 to 0.00005 in 10000 steps and then keep constant.

ader47 avatar Dec 09 '23 15:12 ader47

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右

在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?

您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005?

不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

clearlyzero avatar Dec 09 '23 15:12 clearlyzero

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右

在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?

您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005?

不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

ader47 avatar Dec 09 '23 15:12 ader47

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右

在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?

您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005?

不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

Are the checkpoints provided also impossible to reproduce?

clearlyzero avatar Dec 09 '23 16:12 clearlyzero

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右

在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?

您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005?

不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

Are the checkpoints provided also impossible to reproduce?

No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced

ader47 avatar Dec 09 '23 16:12 ader47

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右

在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?

您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005?

不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

Are the checkpoints provided also impossible to reproduce?

No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced

This is indeed very complicated and difficult. I am still training a very simple data set and have not applied it yet.

clearlyzero avatar Dec 09 '23 16:12 clearlyzero

我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。

损失会在0.2左右

在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?

您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。

如果设置为 FALSE lr = n_gpus*0.00005?

不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。

Thank you for your reply, I have a general understanding,I will try it later.

The FID in paper could not be reproduced using the own trained ckpt😭

Are the checkpoints provided also impossible to reproduce?

No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced

This is indeed very complicated and difficult. I am still training a very simple data set and have not applied it yet.

Good luck 👍

ader47 avatar Dec 09 '23 16:12 ader47

May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.

clearlyzero avatar Dec 10 '23 01:12 clearlyzero

May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.

yes,because it is kl-f8,f8 means it compress 8 times of the spatial size

ader47 avatar Dec 10 '23 05:12 ader47

May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.

yes,because it is kl-f8,f8 means it compress 8 times of the spatial size

I can now use the encoder to compress the image and then diffuse it and generate some images that are not that good quality. I use a very small Unet😂

clearlyzero avatar Dec 11 '23 04:12 clearlyzero

May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.

But I think this does not matter, cause that the higher the compression ratio is, the lower the quaility of generative resultes is. Actually 64x64x3 has a lower compression ratio than that of 32x32x4.

Ly403 avatar Apr 02 '24 14:04 Ly403