latent-diffusion
latent-diffusion copied to clipboard
Question about --scale_lr
Hi, I encountered some problems when I train the unconditional LDM. I trained the LDM with 2 RTX 3090.
When should I use --scale_lr True
to scale the learning rate? (Actually, it's True by default....)
The learning rate is scaled by accumulate_grad_batches * ngpu * bs * base_lr
Why should scaled the learning rate by this?
If I use batch size 48, the learning rate will become 1*2*48*0.00005
, much bigger than the lr in the Paper( 0.00005). and the model won't be converged.
I want to train the model with the paper settings, should I set --scaled False
?
i have faced the same problem, and i found that the model converges well in my task while scaled is False
why do you need a batch size as big as 48? I don't think rtx3090 has enough memory.
why do you need a batch size as big as 48? I don't think rtx3090 has enough memory.
I want to train on the LSUN_chruches dataset , and the batch size in the original paper is 96. The max batch size I tested on RTX 3090 is 52.
Hi, I encountered some problems when I train the unconditional LDM. I trained the LDM with 2 RTX 3090. When should I use
--scale_lr True
to scale the learning rate? (Actually, it's True by default....) The learning rate is scaled byaccumulate_grad_batches * ngpu * bs * base_lr
Why should scaled the learning rate by this? If I use batch size 48, the learning rate will become1*2*48*0.00005
, much bigger than the lr in the Paper( 0.00005). and the model won't be converged. I want to train the model with the paper settings, should I set--scaled False
?
Got the same question
I also encountered a situation where the model was unable to converge. I kept the learning rate constant at 5e-5 and it seemed that it could not converge.
I also encountered a situation where the model was unable to converge. I kept the learning rate constant at 5e-5 and it seemed that it could not converge.
The loss will be fluctuated around about 0.2
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右波动
In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右波动
In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?
You kept the provided settings or you changed the settings?I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False,others can be True.
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右波动
In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?
You kept the provided settings or you changed the settings?I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False,others can be True.
If set to FALSE the lr = n_gpus*0.00005?
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右波动
In my experiment, the loss is around 0.4. Can it be understood as convergence around 0.2?
You kept the provided settings or you changed the settings?I kept the settings and the loss is around 0.2, but I forgot on which dataset. In my experiment only the LSUN_Churches should set the scale_lr False,others can be True.
If set to FALSE the lr = n_gpus*0.00005?
No, the lr=0.00005 and I remember the code sets a lr linear scheduler, so the lr will increase from 0 to 0.00005 in 10000 steps and then keep constant.
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右
在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?
您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。
如果设置为 FALSE lr = n_gpus*0.00005?
不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。
Thank you for your reply, I have a general understanding,I will try it later.
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右
在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?
您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。
如果设置为 FALSE lr = n_gpus*0.00005?
不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。
Thank you for your reply, I have a general understanding,I will try it later.
The FID in paper could not be reproduced using the own trained ckpt😭
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右
在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?
您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。
如果设置为 FALSE lr = n_gpus*0.00005?
不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。
Thank you for your reply, I have a general understanding,I will try it later.
The FID in paper could not be reproduced using the own trained ckpt😭
Are the checkpoints provided also impossible to reproduce?
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右
在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?
您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。
如果设置为 FALSE lr = n_gpus*0.00005?
不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。
Thank you for your reply, I have a general understanding,I will try it later.
The FID in paper could not be reproduced using the own trained ckpt😭
Are the checkpoints provided also impossible to reproduce?
No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右
在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?
您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。
如果设置为 FALSE lr = n_gpus*0.00005?
不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。
Thank you for your reply, I have a general understanding,I will try it later.
The FID in paper could not be reproduced using the own trained ckpt😭
Are the checkpoints provided also impossible to reproduce?
No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced
This is indeed very complicated and difficult. I am still training a very simple data set and have not applied it yet.
我也遇到过模型无法收敛的情况。我将学习率保持在5e-5不变,看起来无法收敛。
损失会在0.2左右
在我的实验中,损失约为 0.4。可以理解为收敛在0.2左右吗?
您保留了提供的设置还是更改了设置?我保留了设置,损失约为 0.2,但我忘记了在哪个数据集上。在我的实验中,只有LSUN_Churches应该将scale_lr设置为False,其他可以设置为True。
如果设置为 FALSE lr = n_gpus*0.00005?
不,lr=0.00005,我记得代码设置了一个lr线性调度器,因此lr将以10000步从0增加到0.00005,然后保持不变。
Thank you for your reply, I have a general understanding,I will try it later.
The FID in paper could not be reproduced using the own trained ckpt😭
Are the checkpoints provided also impossible to reproduce?
No you can reproduce the FID using the provided ckpt, but using the own trained ckpt the FID cloud not be reproduced
This is indeed very complicated and difficult. I am still training a very simple data set and have not applied it yet.
Good luck 👍
May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.
May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.
yes,because it is kl-f8,f8 means it compress 8 times of the spatial size
May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.
yes,because it is kl-f8,f8 means it compress 8 times of the spatial size
I can now use the encoder to compress the image and then diffuse it and generate some images that are not that good quality. I use a very small Unet😂
May I ask if the latent space size of the autokl encoder used in the LSUN_Churches data set is 32x32x4? I am currently using 64x64x3. I wonder if this is the reason why I have suffered a lot of losses.
But I think this does not matter, cause that the higher the compression ratio is, the lower the quaility of generative resultes is. Actually 64x64x3 has a lower compression ratio than that of 32x32x4.