diffusers
diffusers copied to clipboard
Dreambooth example on SD2-768 model is producing weird results
Describe the bug
Hi, This only happens on the 768 model, the base and 1.5/1.4 are not affected as far as I can tell.
Here are the samples produced while training, it does generate properly before training takes place.
(keep in mind the weird prompt is on purpose, it's unique tokens to see the changes quickly so I can tell if it's working or not)
Any help would be appreciated
Reproduction
Use the example on the 768 model
Logs
No response
System Info
WSL Windows 11, 4090
I had the same issue when SD2 only came out. I believe it has something with v_prediction of 768 model. I tried to modify sampler in dreambooth training script but no luck here yet. I've seen people already fine-tuning 512x512 base model
Yeah I'm also leaning towards the v_prediction, but I tried manually passing that into the noise scheduler and the sample generator scheduler and even tried different ones but no luck
Related:
- https://github.com/TheLastBen/fast-stable-diffusion/issues/663
Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth
Originally posted by @archimedesinstitute in https://github.com/TheLastBen/fast-stable-diffusion/issues/663#issuecomment-1328333947
Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth
Maybe related: I tried plugging in the example as-in from the dreambooth blogpost.
The DDIM scheduler is created with "epsilon" prediction. I believe it has to be "v-prediction". Specifically this line is missing an argument:
# Setup the scheduler and pipeline
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, prediction_type="v_prediction") # <- make sure we are doing v_prediction
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16, revision="fp16").to("cuda")
This may be part of a larger family of issues that is making diffusers
break with anything that isn't Euler sampling, at least for the 768 model
Edit: Confirmed, I manually passed in epsilon vs. v_prediction.
Here is the output image with the code shown in the blog post:
Here is after manually passing in "v_prediction":
Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth
Maybe related: I tried plugging in the example as-in from the dreambooth blogpost.
The DDIM scheduler is created with "epsilon" prediction. I believe it has to be "v-prediction". Specifically this line is missing an argument:
# Setup the scheduler and pipeline scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, prediction_type="v_prediction") # <- make sure we are doing v_prediction pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16, revision="fp16").to("cuda")
This may be part of a larger family of issues that is making
diffusers
break with anything that isn't Euler sampling, at least for the 768 modelEdit: Confirmed, I manually passed in epsilon vs. v_prediction.
Here is the output image with the code shown in the blog post:
Here is after manually passing in "v_prediction":
nuh I'm talking about after trying to train it, getting it to just render works fine, after training it a bit it only produces the above examples, I did try passing v_prediticion manually using a few samplers.
indeed
Update: I accidentally trained a non-V checkpoint and had to force it to epsilon with a non-V yaml. Works now.
Update update: Actually, no that's not right either. I trained on a V-768, not base v2. It only outputs images in epsilon mode, and they are very bad quality. I don't know what it means but hopefully this helps.
The problem with the training script is that the loss calculation isn't adjusted for SD 2.0's v-prediction. If you look at this patch from a repo with working finetuning, there's this new code:
https://github.com/smirkingface/stable-diffusion/commit/38e28e978f58355b6c47a12936dce08a68ea90d8#diff-c412907a30683069bf476b05c2a723954768c6ca975b49b7dc753d8f4956d1e8R258-R267
I adapted it to my custom training script by adding the following function before the training loop:
def get_loss(noise_pred, noise, latents, timesteps):
if noise_scheduler.config.prediction_type == "v_prediction":
timesteps = timesteps.view(-1, 1, 1, 1)
alphas_cumprod = noise_scheduler.alphas_cumprod[timesteps]
alpha_t = torch.sqrt(alphas_cumprod)
sigma_t = torch.sqrt(1 - alphas_cumprod)
target = alpha_t * noise - sigma_t * latents
else:
target = noise
return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")
and replacing every call of F.mse_loss(...)
with get_loss(noise_pred, noise, latents, timesteps)
.
I'm currently running Dreambooth training and it looks promising. Other training scripts (e.g. Textual Inversion) need to be adjusted as well.
The problem with the training script is that the loss calculation isn't adjusted for SD 2.0's v-prediction. If you look at this patch from a repo with working finetuning, there's this new code:
I adapted it to my custom training script by adding the following function before the training loop:
def get_loss(noise_pred, noise, latents, timesteps): if noise_scheduler.config.prediction_type == "v_prediction": timesteps = timesteps.view(-1, 1, 1, 1) alphas_cumprod = noise_scheduler.alphas_cumprod[timesteps] alpha_t = torch.sqrt(alphas_cumprod) sigma_t = torch.sqrt(1 - alphas_cumprod) target = alpha_t * noise - sigma_t * latents else: target = noise return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")
and replacing every call of
F.mse_loss(...)
withget_loss(noise_pred, noise, latents, timesteps)
.I'm currently running Dreambooth training and it looks promising. Other training scripts (e.g. Textual Inversion) need to be adjusted as well.
Is this wrong? It feels wrong. (train_dreambooth.py)
train_dreambooth.py:688: UserWarning: Using a target size (torch.Size([2, 4, 96, 96])) that is different to the input size (torch.Size([1, 4, 96, 96])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")
The problem with the training script is that the loss calculation isn't adjusted for SD 2.0's v-prediction. If you look at this patch from a repo with working finetuning, there's this new code: smirkingface/stable-diffusion@38e28e9#diff-c412907a30683069bf476b05c2a723954768c6ca975b49b7dc753d8f4956d1e8R258-R267 I adapted it to my custom training script by adding the following function before the training loop:
def get_loss(noise_pred, noise, latents, timesteps): if noise_scheduler.config.prediction_type == "v_prediction": timesteps = timesteps.view(-1, 1, 1, 1) alphas_cumprod = noise_scheduler.alphas_cumprod[timesteps] alpha_t = torch.sqrt(alphas_cumprod) sigma_t = torch.sqrt(1 - alphas_cumprod) target = alpha_t * noise - sigma_t * latents else: target = noise return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")
and replacing every call of
F.mse_loss(...)
withget_loss(noise_pred, noise, latents, timesteps)
. I'm currently running Dreambooth training and it looks promising. Other training scripts (e.g. Textual Inversion) need to be adjusted as well.Is this wrong? It feels wrong. (train_dreambooth.py)
train_dreambooth.py:688: UserWarning: Using a target size (torch.Size([2, 4, 96, 96])) that is different to the input size (torch.Size([1, 4, 96, 96])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")
it's wrong, diffusers just got updated with support for the 768 model, trying it now myself.
Nice!
https://github.com/ShivamShrirao/diffusers/commit/6c56f05097f7d3c561f02dc1c27e3dd7e9f88ce1#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147
Nice!
ShivamShrirao@6c56f05#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147
keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it.
Nice! ShivamShrirao@6c56f05#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147
keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it.
Indeed. Now I'm getting issues with --revision="fp16". https://github.com/huggingface/diffusers/issues/1246
Spinning up an A100 and building xformers wheel so I can switch from fp16 bc that fix didn't do it either.
Edit: using TheLastBen's A100 xformers to save time. will update if it works.
Update: it's training now, but I won't know if it's training well until later. I didn't port the intermediate sample saving from Shivam's train_dreambooth.py.
Update update: It trained.. ok ish. Definitely works but I need to play with the config. Haven't loaded into Automatic1111 yet but I think it'll work.
Nice! ShivamShrirao@6c56f05#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147
keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it.
Indeed. Now I'm getting issues with --revision="fp16". #1246
Spinning up an A100 and building xformers wheel so I can switch from fp16 bc that fix didn't do it either.
Edit: using TheLastBen's A100 xformers to save time. will update if it works.
Update: it's training now, but I won't know if it's training well until later. I didn't port the intermediate sample saving from Shivam's train_dreambooth.py.
Update update: It trained.. ok ish. Definitely works but I need to play with the config. Haven't loaded into Automatic1111 yet but I think it'll work.
Seems like it's working fine
What is your LR?
On Mon, Nov 28, 2022, 6:36 PM devilismyfriend @.***> wrote:
Nice!
@.***#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147 https://github.com/ShivamShrirao/diffusers/commit/6c56f05097f7d3c561f02dc1c27e3dd7e9f88ce1#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147
keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it.
Indeed. Now I'm getting issues with --revision="fp16". #1246 https://github.com/huggingface/diffusers/issues/1246
Spinning up an A100 and building xformers wheel so I can switch from fp16 bc that fix didn't do it either.
Edit: using TheLastBen's A100 xformers to save time. will update if it works.
Update: it's training now, but I won't know if it's training well until later. I didn't port the intermediate sample saving from Shivam's train_dreambooth.py.
Update update: It trained.. ok ish. Definitely works but I need to play with the config. Haven't loaded into Automatic1111 yet but I think it'll work.
Seems like it's working fine [image: image] https://user-images.githubusercontent.com/87043616/204424803-4ed3b1b9-c5db-4991-aa93-42b12a9fec0f.png
— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/1429#issuecomment-1329997613, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUPRIFYMUXEXZWRUX73XYTWKVT2JANCNFSM6AAAAAASLUJW5Y . You are receiving this because you commented.Message ID: @.***>
What is your LR? … On Mon, Nov 28, 2022, 6:36 PM devilismyfriend @.> wrote: Nice! @.#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147 <ShivamShrirao@6c56f05#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147> keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it. Indeed. Now I'm getting issues with --revision="fp16". #1246 <#1246> Spinning up an A100 and building xformers wheel so I can switch from fp16 bc that fix didn't do it either. Edit: using TheLastBen's A100 xformers to save time. will update if it works. Update: it's training now, but I won't know if it's training well until later. I didn't port the intermediate sample saving from Shivam's train_dreambooth.py. Update update: It trained.. ok ish. Definitely works but I need to play with the config. Haven't loaded into Automatic1111 yet but I think it'll work. Seems like it's working fine [image: image] https://user-images.githubusercontent.com/87043616/204424803-4ed3b1b9-c5db-4991-aa93-42b12a9fec0f.png — Reply to this email directly, view it on GitHub <#1429 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUPRIFYMUXEXZWRUX73XYTWKVT2JANCNFSM6AAAAAASLUJW5Y . You are receiving this because you commented.Message ID: @.***>
5e-6
@patil-suraj is this solved in our scripts?
The support to train SD2-768 was only added recently in #1455, all SD training scripts (text-inversion, dreambooth, text_to_image) now work on main. Could you please try again using main
? Thanks!
I think this should be resolved as per @patil-suraj's comment. Closing it, feel free to reopen if needed :)