kohya_ss icon indicating copy to clipboard operation
kohya_ss copied to clipboard

About the effect of Dadaptation Optimizer on the learning rate

Open maxhhd opened this issue 1 year ago • 6 comments

There are three learning rates in the GUI, Learning rate, Text Encoder learning rate and Unet learning rate. So far I know that the optimizer can change Learning rate. But what about others? Most people set the other two as 1 and 0.5, according to the tensorboard display, these two meet this setting. Obviously, if the learning rate is really as shown by the two, then there is no doubt that the model must be underfit. So I hope the author can tell me what this optimizer does to the other two learning rates, so that I can have a clear understanding of the impact of this optimizer.

maxhhd avatar Mar 11 '23 13:03 maxhhd

I'd be interested in this as well, dAdaption asks for 1, and I understood it wants the actual learning rate set to one. But then the console complains about the Text encoder LR being too low etc. So I'm not sure what to set where.

VeMeth avatar Mar 16 '23 16:03 VeMeth

Dadaptation is kind of special. What you set via the LR fields is not the Learning Rate but rather a Distance value that will be applied to the LR calculated by Dadaptation. Since when training LoRA it is better to train the TE less than the UNet (usually hald the LR) then you set the Distance value for TE LR to 0.5. So when I use Dadaptation I set things like:

image

bmaltais avatar Mar 16 '23 18:03 bmaltais

Thank you so much, just for clarity, is Dadaption the same as your LR-Free branch? (I didn't manage to get it run and it seems mostly merged into master by now).

VeMeth avatar Mar 16 '23 22:03 VeMeth

It is. Just implemented directly in kohya_ss code base.

bmaltais avatar Mar 16 '23 23:03 bmaltais

Thank you. So I tried your settings, and I see some improvements in Tensorflow, but compared to your video, I only have one lr/d*lr and it seems rarther static after the first few steps. TE an Unet LR are displayed as static aswell.

image

I'm unsure if this is just a display problem or if those rates are for some reason not adjusted.

VeMeth avatar Mar 16 '23 23:03 VeMeth

The official code from kohya did not implement all the logs I added in the test branch... But it is essentially there... just nov reported.

bmaltais avatar Mar 17 '23 18:03 bmaltais

When I run kohya directly I get this

use D-Adaptation Adam optimizer | {'decouple': True, 'weight_decay': 0.02}
when multiple learning rates are specified with dadaptation (e.g. for Text Encoder and U-Net), only the first one will take effect / D-Adaptationで複数の学習率を指定した場合(Text EncoderとU-Netなど)、最初の学習率のみが有効になります: lr=0.5

How did you get it to work with different LR for UNET and TE?

hollowstrawberry avatar Apr 22 '23 22:04 hollowstrawberry