support fused_back_pass for prodigy-plus-schedule-free
Copied the internals from https://github.com/LoganBooker/prodigy-plus-schedule-free into kohya library/prodigy_plus_schedulefree.py and made training scripts support either prodigyplus or fused adafactor when setting FBP
From my short tests dreambooth training w/ args --fused_backward_pass --optimizer_type="prodigyplus.ProdigyPlusScheduleFree"
sd3.5medium 512x512 res w/ --full_bf16 base prodigy = 27.2 gb vram usage prodigy-plus-schedule-free = 15.4 gb prodigy-plus-schedule-free w/ FBP = 10.2 gb
sdxl 1024x1024 w/ --full_bf16 base prodigy = 33 gb prodigy-plus-schedule-free = 19 gb prodigy-plus-schedule-free w/ FBP = 13 gb
didn't test flux but should be similar gains
wow nice
@michP247 you find this better than other optimizers?
wow nice
@michP247 you find this better than other optimizers?
will check results later, I've haven't actually completed any training in my tests, just did a quick vram check last night lol (edited to mention I was using full bf16). Still need to figure out the correct --prodigy_steps value.
Thanks for this pull request!
But I think it may work with the --optimizer_type and --optimizer_args options, like --optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True" without any additional implementation. Have you tried this?
Thanks for this pull request!
But I think it may work with the
--optimizer_typeand--optimizer_argsoptions, like--optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True"without any additional implementation. Have you tried this?
Hm Ok so I just tried and it does already work as an optimizer arg which I overlooked. But at least now it won't break anything if --fused_backward_pass is passed regularly.
Thanks for this pull request! But I think it may work with the
--optimizer_typeand--optimizer_argsoptions, like--optimizer_type "prodigyplus.ProdigyPlusScheduleFree" --optimizer_args "fused_back_pass=True"without any additional implementation. Have you tried this?Hm Ok so I just tried and it does already work as an optimizer arg which I overlooked. But at least now it won't break anything if --fused_backward_pass is passed regularly.
Which will be good for bmaltais gui since we'd simply be able to use the FBP checkbox with this optimizer
update:I apologize for the late reply. The issue has been fixed in ProdigyPlusScheduleFree v1.8.3. Thanks to @LoganBooker for his work. I tested the v1.8.4 and it works fine now, no longer needing the modifications from my commit.
Previous comment the issue is about register_post_accumulate_grad_hook and groups_to_process.
I attempted to add fused backward pass to train_network.py my changes: https://github.com/kohya-ss/sd-scripts/compare/sd3...Exist-c:sd-scripts:sd3
Based on the implementation in sdxl_train.py and my tests in train_network.py , I think that the optimizer's step_param should be registered to the parameters, similar to Adafactor. Otherwise, optimizer will do nothing. I'm not certain if Flux or SD3.5 necessitate this, but I thought it would be helpful to mention it. Here is my implementation in train_network.py.
# accelerator has wrapped the optimizer
# we need optimizer.optimizer to access the original function.
for param_group in optimizer.optimizer.param_groups:
for parameter in param_group["params"]:
if parameter.requires_grad:
def __grad_hook(tensor: torch.Tensor, param_group=param_group):
if accelerator.sync_gradients and args.max_grad_norm != 0.0:
accelerator.clip_grad_norm_(tensor, args.max_grad_norm)
optimizer.optimizer.step_param(tensor, param_group)
tensor.grad = None # clear grad to save memory
parameter.register_post_accumulate_grad_hook(__grad_hook)
And in my implementation, if both the text_encoder and unet are traning, parameters of the next step would be prematurely called using step_param , leading to errors. I made some modifications in on_end_step() ,but I think it changes optimizer's behavior, it is not the correct solution.
def patch_on_end_step(optimizer,group):
group_index = optimizer.optimizer.param_groups.index(group)
# my patch I think it's wrong,
if group_index not in optimizer.optimizer.groups_to_process:
return False
# Decrement params processed so far.
optimizer.optimizer.groups_to_process[group_index] -= 1
...
I'm not good at English, and the above translations are all done by machine translation. I hope I haven't offended anyone.
Update: As of this commit for Prodigy+SF, all that should be needed in this pull request is to alter the assert; it will then be sufficient to set args.fused_backward_pass=True to activate FBP -- the optimiser will take care of the rest. Note that like Adafactor, Kohya only supports FBP for full finetuning (as far as I'm aware).
Previous comment follows.
Hello all, and thanks for your interest in the optimiser. I made a best-effort attempt to match how Kohya had implemented fused backward pass for Adafactor, in the hope it would be fairly straightforward to add support. Seems it's a bit more involved!
I've had a closer look at the SD3 branch and decided it would be easier perhaps to monkey patch the Adafactor patching method. This has been done in my most recent commit (https://github.com/LoganBooker/prodigy-plus-schedule-free/commit/93339d859eb7b1119a004edecf417f5318227af8). Note I haven't created a new package for this just yet, so you'll need to use the code/repo directly to get this change.
What this means is that for this pull request, all you should need to do is tweak the hard-coded assert in train_util.py to allow Prodigy+SF as well. That's it (apart from installing/importing/selecting the optimiser itself).
https://github.com/kohya-ss/sd-scripts/blob/e89653975ddf429cdf0c0fd268da0a5a3e8dba1f/library/train_util.py#L4633-L4636
Once that's done, you should be able to use the fused backward pass by passing fused_backward_pass=True to the optimiser, and setting args.fused_backward_pass=True to Kohya. Alternatively, you could retain the change that appends it to the optimiser arguments.
Hello all, and thanks for your interest in the optimiser. I made a best-effort attempt to match how Kohya had implemented fused backward pass for Adafactor, in the hope it would be fairly straightforward to add support. Seems it's a bit more involved!
I've had a closer look at the SD3 branch and decided it would be easier perhaps to monkey patch the Adafactor patching method. This has been done in my most recent commit (LoganBooker/prodigy-plus-schedule-free@93339d8). Note I haven't created a new package for this just yet, so you'll need to use the code/repo directly to get this change.
What this means is that for this pull request, all you should need to do is tweak the hard-coded assert in
train_util.pyto allow Prodigy+SF as well. That's it (apart from installing/importing/selecting the optimiser itself).https://github.com/kohya-ss/sd-scripts/blob/e89653975ddf429cdf0c0fd268da0a5a3e8dba1f/library/train_util.py#L4633-L4636
Once that's done, you should be able to use the fused backward pass by passing
fused_backward_pass=Trueto the optimiser, and settingargs.fused_backward_pass=Trueto Kohya. Alternatively, you could retain the change that appends it to the optimiser arguments.
someone should take a look at this and the suggestion from Exist-c. I won't be able to update this PR for a while while I fix some PC troubles
i did trainings with prodigyplus.ProdigyPlusScheduleFree yesterday but it didnt learn anything
i must be missing optimizer arguments can you give me solid example? thank you
i trained up to 5600 steps and another model 2800 steps