deGENERATIVE-SQUAD comments

Results 15 comments of


                                            deGENERATIVE-SQUAD

Add wavelet loss for networks

Found this PR finally. I hope you don’t mind if I share something relevant to the topic. A guy from an one forum made this wavelet implementation based on many...

Add wavelet loss for networks

@WinodePino >I am trying to test this, but somehow my caching latents gets stuck at 0%, any idea why this could be? I’m not using the cache at all -...

Add wavelet loss for networks

@WinodePino >Thanks, it is working now, but it doesnt seem to learn anything. If the iteration is proceeding at a normal speed, then obviously the problem lies in the too...

fused_backward_pass in prodigy-plus-schedule-free [SOLVED VIA EXTERNAL SOLUTION ✅]

> #1866 I tried both the Lora algorithm and the Glora+Dora algorithm on SDXL - no noticeable decrease in VRAM usage. Speeds are the same with and without fused_backward_pass also....

fused_backward_pass in prodigy-plus-schedule-free [SOLVED VIA EXTERNAL SOLUTION ✅]

> Try it without my propesed changes, apparently it was already working if you set --fused_back_pass as an optimizer arg Tested it earlier: 1. back_pass as an optimizer argument is...

fused_backward_pass in prodigy-plus-schedule-free [SOLVED VIA EXTERNAL SOLUTION ✅]

Problem solved in new version of https://github.com/LoganBooker/prodigy-plus-schedule-free According issue thread https://github.com/LoganBooker/prodigy-plus-schedule-free/issues/7 now it works with full finetunes and loras (the problem was the lack of Fused support for LoRa in...

fused_backward_pass in prodigy-plus-schedule-free [SOLVED VIA EXTERNAL SOLUTION ✅]

@FurkanGozukara > what optimizer parameters are required? Basically: d0, eps, d_coef, use_stableadamw, and stochastic_rounding Optionally: use_bias_correction, factored, weight_decay, split_groups (for learning rate splitting between U-Net and TE). > i did...

Nan norm

Show your full config and describe dataset characteristics. At what step do nan start? Usually, NaN indicates latent issues, for example, if you are using full_fp16, then some optimizers and...

⛔[SD3 branch] Huber+SNR RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

@kohya-ss ``` Traceback (most recent call last): File "K:\sd-scripts\sd-scripts-sd3\sdxl_train_network.py", line 229, in trainer.train(args) File "K:\sd-scripts\sd-scripts-sd3\train_network.py", line 1403, in train loss = self.process_batch( ^^^^^^^^^^^^^^^^^^^ File "K:\sd-scripts\sd-scripts-sd3\train_network.py", line 463, in process_batch huber_c...

⛔[SD3 branch] Huber+SNR RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

@kohya-ss This definitely doesn’t depend on the optimizer being used - the same issue occurs with Adafactor or any other optimizer. d_limiter is not the root of the problem, it...