Possible Enhancements/optimizations. - Is quantization possible for kohya? ai toolkit has it. Serious speed boost for 1024 training, and bucketing multi resolutions.
-
Is it possible to add quantization? with ai toolkit i get 22.1gb vram doing 1024 training, and 1400 steps done in 45 mins in kohya 1024 training, takes me nearly 2 hours with the same amount.
-
Multiresolution training with bucketing, it buckets all my images into 512, 768, and 1024.
overall I'm getting quite good results.
I also noticed that ai toolkit has an option in the config for content_or_style:
and i pick content_or_style: balanced.
this seems to give me great flexibility when prompting.
however with a lora trained on kohya, it seems to stick to the data, i can't prompt for anime style or comic style with my training all that well.
- Is it possible to add quantization? with ai toolkit i get 22.1gb vram doing 1024 training, and 1400 steps done in 45 mins in kohya 1024 training, takes me nearly 2 hours with the same amount.
From my understanding, this is due to multi-resolution training: training at random resolutions from 512, 768, and 1024 is faster than training at 1024 alone.
2. Multiresolution training with bucketing, it buckets all my images into 512, 768, and 1024.
It's possible. See https://github.com/kohya-ss/sd-scripts/tree/sd3?tab=readme-ov-file#flux1-multi-resolution-training
If you use three resolutions, the number of total steps (=data) will increase three times, so please reduce the number of training steps/epochs accordingly.
I also noticed that ai toolkit has an option in the config for content_or_style:
and i pick content_or_style: balanced.
This option seems to change the distribution of timesteps (whether to have more ones closer to noise or more ones closer to the final image). Balanced is no adjustment, i.e. the same as sd-scripts: https://github.com/ostris/ai-toolkit/blob/e127c079da2fc6f8cafc03816e030d705a1ae43d/jobs/process/BaseSDTrainProcess.py#L956
Adjusting the distribution is not difficult to implement, but I would like to prioritize other features.
New option shift is added to --timestep_sampling. This changes the distribution of timesteps, so it may have the similar effect for AI-toolkit. Please see README on the sd3 branch.