diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Dreambooth example on SD2-768 model is producing weird results

Open devilismyfriend opened this issue 2 years ago • 16 comments

Describe the bug

Hi, This only happens on the 768 model, the base and 1.5/1.4 are not affected as far as I can tell.

Here are the samples produced while training, it does generate properly before training takes place. (keep in mind the weird prompt is on purpose, it's unique tokens to see the changes quickly so I can tell if it's working or not) image

Any help would be appreciated

Reproduction

Use the example on the 768 model

Logs

No response

System Info

WSL Windows 11, 4090

devilismyfriend avatar Nov 25 '22 21:11 devilismyfriend

I had the same issue when SD2 only came out. I believe it has something with v_prediction of 768 model. I tried to modify sampler in dreambooth training script but no luck here yet. I've seen people already fine-tuning 512x512 base model

thepowerfuldeez avatar Nov 26 '22 05:11 thepowerfuldeez

Yeah I'm also leaning towards the v_prediction, but I tried manually passing that into the noise scheduler and the sample generator scheduler and even tried different ones but no luck

devilismyfriend avatar Nov 26 '22 05:11 devilismyfriend

Related:

  • https://github.com/TheLastBen/fast-stable-diffusion/issues/663

0xdevalias avatar Nov 27 '22 20:11 0xdevalias

Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth

Originally posted by @archimedesinstitute in https://github.com/TheLastBen/fast-stable-diffusion/issues/663#issuecomment-1328333947

0xdevalias avatar Nov 27 '22 20:11 0xdevalias

Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth

Maybe related: I tried plugging in the example as-in from the dreambooth blogpost.

The DDIM scheduler is created with "epsilon" prediction. I believe it has to be "v-prediction". Specifically this line is missing an argument:

# Setup the scheduler and pipeline
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, prediction_type="v_prediction") # <- make sure we are doing v_prediction
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16, revision="fp16").to("cuda")

This may be part of a larger family of issues that is making diffusers break with anything that isn't Euler sampling, at least for the 768 model

Edit: Confirmed, I manually passed in epsilon vs. v_prediction.

Here is the output image with the code shown in the blog post: image

Here is after manually passing in "v_prediction": image

enzokro avatar Nov 28 '22 00:11 enzokro

Is this helpful @TheLastBen https://dushyantmin.com/fine-tuning-stable-diffusion-v20-with-dreambooth

Maybe related: I tried plugging in the example as-in from the dreambooth blogpost.

The DDIM scheduler is created with "epsilon" prediction. I believe it has to be "v-prediction". Specifically this line is missing an argument:

# Setup the scheduler and pipeline
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, prediction_type="v_prediction") # <- make sure we are doing v_prediction
pipe = StableDiffusionPipeline.from_pretrained(model_path, scheduler=scheduler, safety_checker=None, torch_dtype=torch.float16, revision="fp16").to("cuda")

This may be part of a larger family of issues that is making diffusers break with anything that isn't Euler sampling, at least for the 768 model

Edit: Confirmed, I manually passed in epsilon vs. v_prediction.

Here is the output image with the code shown in the blog post: image

Here is after manually passing in "v_prediction": image

nuh I'm talking about after trying to train it, getting it to just render works fine, after training it a bit it only produces the above examples, I did try passing v_prediticion manually using a few samplers.

devilismyfriend avatar Nov 28 '22 03:11 devilismyfriend

image indeed

Update: I accidentally trained a non-V checkpoint and had to force it to epsilon with a non-V yaml. Works now.

image

Update update: Actually, no that's not right either. I trained on a V-768, not base v2. It only outputs images in epsilon mode, and they are very bad quality. I don't know what it means but hopefully this helps.

articulite avatar Nov 28 '22 08:11 articulite

The problem with the training script is that the loss calculation isn't adjusted for SD 2.0's v-prediction. If you look at this patch from a repo with working finetuning, there's this new code:

https://github.com/smirkingface/stable-diffusion/commit/38e28e978f58355b6c47a12936dce08a68ea90d8#diff-c412907a30683069bf476b05c2a723954768c6ca975b49b7dc753d8f4956d1e8R258-R267

I adapted it to my custom training script by adding the following function before the training loop:

    def get_loss(noise_pred, noise, latents, timesteps):
        if noise_scheduler.config.prediction_type == "v_prediction":
            timesteps = timesteps.view(-1, 1, 1, 1)
            alphas_cumprod = noise_scheduler.alphas_cumprod[timesteps]
            alpha_t = torch.sqrt(alphas_cumprod)
            sigma_t = torch.sqrt(1 - alphas_cumprod)
            target = alpha_t * noise - sigma_t * latents
        else:
            target = noise

        return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")

and replacing every call of F.mse_loss(...) with get_loss(noise_pred, noise, latents, timesteps).

I'm currently running Dreambooth training and it looks promising. Other training scripts (e.g. Textual Inversion) need to be adjusted as well.

volpeon avatar Nov 28 '22 11:11 volpeon

The problem with the training script is that the loss calculation isn't adjusted for SD 2.0's v-prediction. If you look at this patch from a repo with working finetuning, there's this new code:

smirkingface/stable-diffusion@38e28e9#diff-c412907a30683069bf476b05c2a723954768c6ca975b49b7dc753d8f4956d1e8R258-R267

I adapted it to my custom training script by adding the following function before the training loop:

    def get_loss(noise_pred, noise, latents, timesteps):
        if noise_scheduler.config.prediction_type == "v_prediction":
            timesteps = timesteps.view(-1, 1, 1, 1)
            alphas_cumprod = noise_scheduler.alphas_cumprod[timesteps]
            alpha_t = torch.sqrt(alphas_cumprod)
            sigma_t = torch.sqrt(1 - alphas_cumprod)
            target = alpha_t * noise - sigma_t * latents
        else:
            target = noise

        return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")

and replacing every call of F.mse_loss(...) with get_loss(noise_pred, noise, latents, timesteps).

I'm currently running Dreambooth training and it looks promising. Other training scripts (e.g. Textual Inversion) need to be adjusted as well.

Is this wrong? It feels wrong. (train_dreambooth.py)

image

train_dreambooth.py:688: UserWarning: Using a target size (torch.Size([2, 4, 96, 96])) that is different to the input size (torch.Size([1, 4, 96, 96])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")

articulite avatar Nov 28 '22 21:11 articulite

The problem with the training script is that the loss calculation isn't adjusted for SD 2.0's v-prediction. If you look at this patch from a repo with working finetuning, there's this new code: smirkingface/stable-diffusion@38e28e9#diff-c412907a30683069bf476b05c2a723954768c6ca975b49b7dc753d8f4956d1e8R258-R267 I adapted it to my custom training script by adding the following function before the training loop:

    def get_loss(noise_pred, noise, latents, timesteps):
        if noise_scheduler.config.prediction_type == "v_prediction":
            timesteps = timesteps.view(-1, 1, 1, 1)
            alphas_cumprod = noise_scheduler.alphas_cumprod[timesteps]
            alpha_t = torch.sqrt(alphas_cumprod)
            sigma_t = torch.sqrt(1 - alphas_cumprod)
            target = alpha_t * noise - sigma_t * latents
        else:
            target = noise

        return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")

and replacing every call of F.mse_loss(...) with get_loss(noise_pred, noise, latents, timesteps). I'm currently running Dreambooth training and it looks promising. Other training scripts (e.g. Textual Inversion) need to be adjusted as well.

Is this wrong? It feels wrong. (train_dreambooth.py)

image

train_dreambooth.py:688: UserWarning: Using a target size (torch.Size([2, 4, 96, 96])) that is different to the input size (torch.Size([1, 4, 96, 96])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size. return F.mse_loss(noise_pred.float(), target.float(), reduction="mean")

it's wrong, diffusers just got updated with support for the 768 model, trying it now myself.

devilismyfriend avatar Nov 28 '22 21:11 devilismyfriend

Nice!

https://github.com/ShivamShrirao/diffusers/commit/6c56f05097f7d3c561f02dc1c27e3dd7e9f88ce1#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147

articulite avatar Nov 28 '22 21:11 articulite

Nice!

ShivamShrirao@6c56f05#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147

keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it.

devilismyfriend avatar Nov 28 '22 22:11 devilismyfriend

Nice! ShivamShrirao@6c56f05#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147

keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it.

Indeed. Now I'm getting issues with --revision="fp16". https://github.com/huggingface/diffusers/issues/1246

Spinning up an A100 and building xformers wheel so I can switch from fp16 bc that fix didn't do it either.

Edit: using TheLastBen's A100 xformers to save time. will update if it works.

Update: it's training now, but I won't know if it's training well until later. I didn't port the intermediate sample saving from Shivam's train_dreambooth.py.

Update update: It trained.. ok ish. Definitely works but I need to play with the config. Haven't loaded into Automatic1111 yet but I think it'll work.

articulite avatar Nov 28 '22 23:11 articulite

Nice! ShivamShrirao@6c56f05#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147

keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it.

Indeed. Now I'm getting issues with --revision="fp16". #1246

Spinning up an A100 and building xformers wheel so I can switch from fp16 bc that fix didn't do it either.

Edit: using TheLastBen's A100 xformers to save time. will update if it works.

Update: it's training now, but I won't know if it's training well until later. I didn't port the intermediate sample saving from Shivam's train_dreambooth.py.

Update update: It trained.. ok ish. Definitely works but I need to play with the config. Haven't loaded into Automatic1111 yet but I think it'll work.

Seems like it's working fine image

devilismyfriend avatar Nov 29 '22 02:11 devilismyfriend

What is your LR?

On Mon, Nov 28, 2022, 6:36 PM devilismyfriend @.***> wrote:

Nice!

@.***#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147 https://github.com/ShivamShrirao/diffusers/commit/6c56f05097f7d3c561f02dc1c27e3dd7e9f88ce1#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147

keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it.

Indeed. Now I'm getting issues with --revision="fp16". #1246 https://github.com/huggingface/diffusers/issues/1246

Spinning up an A100 and building xformers wheel so I can switch from fp16 bc that fix didn't do it either.

Edit: using TheLastBen's A100 xformers to save time. will update if it works.

Update: it's training now, but I won't know if it's training well until later. I didn't port the intermediate sample saving from Shivam's train_dreambooth.py.

Update update: It trained.. ok ish. Definitely works but I need to play with the config. Haven't loaded into Automatic1111 yet but I think it'll work.

Seems like it's working fine [image: image] https://user-images.githubusercontent.com/87043616/204424803-4ed3b1b9-c5db-4991-aa93-42b12a9fec0f.png

— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/1429#issuecomment-1329997613, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUPRIFYMUXEXZWRUX73XYTWKVT2JANCNFSM6AAAAAASLUJW5Y . You are receiving this because you commented.Message ID: @.***>

articulite avatar Nov 29 '22 02:11 articulite

What is your LR? … On Mon, Nov 28, 2022, 6:36 PM devilismyfriend @.> wrote: Nice! @.#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147 <ShivamShrirao@6c56f05#diff-ca2edb3e3367d27c6d26e14da0c9121c7247f0efd166eae35e2a6166d7ee2147> keep in mind you need to build diffusers wheel for yourself as they updated the scheduler itself to support it. Indeed. Now I'm getting issues with --revision="fp16". #1246 <#1246> Spinning up an A100 and building xformers wheel so I can switch from fp16 bc that fix didn't do it either. Edit: using TheLastBen's A100 xformers to save time. will update if it works. Update: it's training now, but I won't know if it's training well until later. I didn't port the intermediate sample saving from Shivam's train_dreambooth.py. Update update: It trained.. ok ish. Definitely works but I need to play with the config. Haven't loaded into Automatic1111 yet but I think it'll work. Seems like it's working fine [image: image] https://user-images.githubusercontent.com/87043616/204424803-4ed3b1b9-c5db-4991-aa93-42b12a9fec0f.png — Reply to this email directly, view it on GitHub <#1429 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABUPRIFYMUXEXZWRUX73XYTWKVT2JANCNFSM6AAAAAASLUJW5Y . You are receiving this because you commented.Message ID: @.***>

5e-6

devilismyfriend avatar Nov 29 '22 05:11 devilismyfriend

@patil-suraj is this solved in our scripts?

patrickvonplaten avatar Dec 01 '22 16:12 patrickvonplaten

The support to train SD2-768 was only added recently in #1455, all SD training scripts (text-inversion, dreambooth, text_to_image) now work on main. Could you please try again using main ? Thanks!

patil-suraj avatar Dec 01 '22 16:12 patil-suraj

I think this should be resolved as per @patil-suraj's comment. Closing it, feel free to reopen if needed :)

pcuenca avatar Dec 13 '22 15:12 pcuenca