ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Resolution bucketing and Trainer implementation refactoring

Open KohakuBlueleaf opened this issue 3 weeks ago • 8 comments

In this PR I proposed a mechanism for resolution bucketing. Unlike standard Aspect Ratio Bucketing, we allow user to input arbitrary resolution latents, we directly do the bucketing on the list of latents and assume they already have user expected size. (In this PR we also added "ResizeToPixelCount" node which can mimic the effect of standard ARB)

Beside Resolution bucketing things, we also fixed the issue in #10940 which lack some data moving cause bad tensor device.

And about Trainer refactoring, we are now split each task (like create adapter, training step in different modes) into separated functions to improve maintainability

We also used custom TrainGuider with modified load model helper to allow custom control on the loading behavior.

TL;DR:

  • New Feature
    • Resolution Bucketing
    • Better resize node
  • Bug fixes
    • #10940
  • Others
    • refactoring for maintainability

KohakuBlueleaf avatar Dec 05 '25 09:12 KohakuBlueleaf

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

MeiYi-dev avatar Dec 05 '25 10:12 MeiYi-dev

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

  1. I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)
  • if you meet any problem in lora training on some model, you can directly open issue and ping me.
  1. I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

KohakuBlueleaf avatar Dec 05 '25 12:12 KohakuBlueleaf

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

  1. I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)
  • if you meet any problem in lora training on some model, you can directly open issue and ping me.
  1. I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

Thank you for the answers. I have some more questions/requests for LORA training in ComfyUI.

For 1: The video models do support image training as well, though that's almost useless since video models are used mostly for animations. If training with videos gets implemented, I would even consider dropping support musubi if video training gets supported.

For 2: Some optimizers are "schedule free" and some optimizers don't require any hyperparameters, so extra optimizers would be so nice to have access to.

As for the new requests, there are some training optimizations that GREATLY improve LORA training speed both in terms of iteration speed and convergence time like:

The relatively simple to implement: https://github.com/compvis/tread | https://github.com/vita-epfl/LayerSync | https://github.com/vvvvvjdy/SRA

Slightly harder to implement: https://github.com/Martinser/REG

There are some more mentioned here https://x.com/SwayStar123/status/1994673352754270318

Even just TREAD which seems to be the easiest to implement and is tested for LORA training will GREATLY improve convergence and iteration speed. It would be awesome if some of these get implemented.

MeiYi-dev avatar Dec 05 '25 13:12 MeiYi-dev

I don't know where to ask this, but does your lora trainer support every model that ComfyUI supports? Also, can you please include optimizers from this project https://pytorch-optimizers.readthedocs.io/en/latest/? (some optimizers here greatly reduce the VRAM consumption.)

  1. I can't personally ensure all ComfyUI supported model is supported in lora trainer, but as tested before it should be "yes" if you are training for image. (Some video model support image gen, like wan2.1 or 2.2, and those 2 are also supported in lora trainer as well)
  • if you meet any problem in lora training on some model, you can directly open issue and ping me.
  1. I can provide a template for custom optimizer support by extension, or you can open PR for the optimizers you want to use. But I need to inform you, in lora training, optimizer and lora occupy way way way less vram than the model itself, change optimizer usually help nothing.

Thank you for the answers. I have some more questions/requests for LORA training in ComfyUI.

For 1: The video models do support image training as well, though that's almost useless since video models are used mostly for animations. If training with videos gets implemented, I would even consider dropping support musubi if video training gets supported.

For 2: Some optimizers are "schedule free" and some optimizers don't require any hyperparameters, so extra optimizers would be so nice to have access to.

As for the new requests, there are some training optimizations that GREATLY improve LORA training speed both in terms of iteration speed and convergence time like:

The relatively simple to implement: https://github.com/compvis/tread | https://github.com/vita-epfl/LayerSync | https://github.com/vvvvvjdy/SRA

Slightly harder to implement: https://github.com/Martinser/REG

There are some more mentioned here https://x.com/SwayStar123/status/1994673352754270318

Even just TREAD which seems to be the easiest to implement will GREATLY improve convergence and iteration speed. It would be awesome if some of these get implemented.

Plz open issue/discussions or PR for your request, plz don't keep posting things unrelated to this PR/thread Thanks

KohakuBlueleaf avatar Dec 05 '25 13:12 KohakuBlueleaf

Hi, I tested this PR, here's my feedback:

  • The mentined bug is fixed, I can now start the training 👍
  • I tried the new resolution bucketing feature, it works well 👍
  • Found a bug, I can't tell whether its introduced in this PR or earlier so here it is.. Disabling gradient_checkpointing causes the training steps to complete very fast without the lora actually being trained, also no errors etc. The steps are iterated but there's no gpu utilization. I can file a separate Issue if this is unrelated to the PR (please verify)

bezo97 avatar Dec 08 '25 21:12 bezo97

Hi, I tested this PR, here's my feedback:

  • The mentined bug is fixed, I can now start the training 👍
  • I tried the new resolution bucketing feature, it works well 👍
  • Found a bug, I can't tell whether its introduced in this PR or earlier so here it is.. Disabling gradient_checkpointing causes the training steps to complete very fast without the lora actually being trained, also no errors etc. The steps are iterated but there's no gpu utilization. I can file a separate Issue if this is unrelated to the PR (please verify)

Will check this issue soon

KohakuBlueleaf avatar Dec 09 '25 01:12 KohakuBlueleaf

Hi, I tested this PR, here's my feedback:

  • The mentined bug is fixed, I can now start the training 👍
  • I tried the new resolution bucketing feature, it works well 👍
  • Found a bug, I can't tell whether its introduced in this PR or earlier so here it is.. Disabling gradient_checkpointing causes the training steps to complete very fast without the lora actually being trained, also no errors etc. The steps are iterated but there's no gpu utilization. I can file a separate Issue if this is unrelated to the PR (please verify)

@bezo97 I have pushed a fix for the gradient_checkpointing bug you mentioned, the bug should be resolved: image

KohakuBlueleaf avatar Dec 09 '25 14:12 KohakuBlueleaf

Cheers! Training now runs correctly when gradient_checkpointing is off 👍

bezo97 avatar Dec 09 '25 23:12 bezo97