kohya_ss
kohya_ss copied to clipboard
About the effect of Dadaptation Optimizer on the learning rate
There are three learning rates in the GUI, Learning rate, Text Encoder learning rate and Unet learning rate. So far I know that the optimizer can change Learning rate. But what about others? Most people set the other two as 1 and 0.5, according to the tensorboard display, these two meet this setting. Obviously, if the learning rate is really as shown by the two, then there is no doubt that the model must be underfit. So I hope the author can tell me what this optimizer does to the other two learning rates, so that I can have a clear understanding of the impact of this optimizer.
I'd be interested in this as well, dAdaption asks for 1, and I understood it wants the actual learning rate set to one. But then the console complains about the Text encoder LR being too low etc. So I'm not sure what to set where.
Dadaptation is kind of special. What you set via the LR fields is not the Learning Rate but rather a Distance value that will be applied to the LR calculated by Dadaptation. Since when training LoRA it is better to train the TE less than the UNet (usually hald the LR) then you set the Distance value for TE LR to 0.5. So when I use Dadaptation I set things like:
Thank you so much, just for clarity, is Dadaption the same as your LR-Free branch? (I didn't manage to get it run and it seems mostly merged into master by now).
It is. Just implemented directly in kohya_ss code base.
Thank you. So I tried your settings, and I see some improvements in Tensorflow, but compared to your video, I only have one lr/d*lr and it seems rarther static after the first few steps. TE an Unet LR are displayed as static aswell.
I'm unsure if this is just a display problem or if those rates are for some reason not adjusted.
The official code from kohya did not implement all the logs I added in the test branch... But it is essentially there... just nov reported.
When I run kohya directly I get this
use D-Adaptation Adam optimizer | {'decouple': True, 'weight_decay': 0.02}
when multiple learning rates are specified with dadaptation (e.g. for Text Encoder and U-Net), only the first one will take effect / D-Adaptationで複数の学習率を指定した場合(Text EncoderとU-Netなど)、最初の学習率のみが有効になります: lr=0.5
How did you get it to work with different LR for UNET and TE?