sd-scripts
sd-scripts copied to clipboard
[Request] Dynamic regularization images
Is it possible to have the script to generate the regularization images by some prompts and then then use those to regulate the model being trained?
To generate those images, the system could ignore the model being trained and generate images without using it. I don't know if this is how it works, but maybe in this way the script can even use the same seed that was used for generation for the regulation itself. With this you could have infinite regularization images, generating a new one whenever "needed" (one for each X amount of steps maybe?).
Thanks in advance!
Would be VERY useful, an aspect bucket debug like Automatic dreambooth extension that crop your images before the training , and a reg images generation like dreambooth extension that read the tag of every dataset image and generate an image with same aspect ratio
'Infinite regularization images' is quite interesting, but it takes a long time for generating images than training, so I think it is difficult.
The automatic generation of the regularization images before training will be possible, but I prefer generating images with quality tags and negative prompts. So the preprocessing scripts to create the regularization images will be an idea.
@kohya-ss Disclaimer: I have barely started to study all of this, so I apologize if I say something that doesn't make sense.
From what I read so far, if we generate the reg. images separately/before (on some SD model), we end up discarding the initial latent noise that we used to generate it, as well as each noise that was reduced on each timestep, during the reg. image creation. But would it be possible to keep and use that information, and not even leave the latent space instead? Maybe it is possible to start creating the reg. image, by starting with it's random latent noise, and also clone that noise for the other custom model that is being regularized (eg. the lora model); then for each timestep during the reg. image creation, the base SD model will predict some noise reduction, but the lora model being trained can also try to predict the noise reduction. Isn't there some difference that can be calculated between both noise reductions at that time, and have this used to adapt the lora model? [Edit: maybe the noise reduction calculated from the lora model could also be discarded for each timestep, and for the next timestep the lora model could also assume the "last latent image" to be whatever the original model has calculated]. Doing this for each step, eventually you reach t0 (the final latent image) and I think you don't even need to go back to pixel space, since the lora model would only care to target the behavior of the base SD model at latent space anyway. [Edit: in other words, the lora model would be trying to compensate it's own distortions and be trying to mimic an identity/neutral lora model].
I thought that this could be efficient because both situations would be starting from the same initial conditions, and they would also be of equal condition for every timestep.
please consider this analysis: https://blog.aboutme.be/2023/08/10/findings-impact-regularization-captions-sdxl-subject-lora/
looks like generatlization only works with images that were generated individually for the captions of the input image set. If this is correct, generating the images during (or specifically for) the training process, is not only a convenience but the only correct way