compel icon indicating copy to clipboard operation
compel copied to clipboard

Compel influencing lora_scale when using LoRA in Diffusers

Open pietrobolcato opened this issue 2 years ago • 8 comments

Describe the bug

When using compel and prompt embeddings, and performing inference with LoRA weights loaded, lora_scale doesn't work as expected. Specifically, if I do the following actions in the following order:

  1. Create a SD pipeline
  2. Load a model
  3. Load a LoRA
  4. Generate an image with lora_scale = 1
  5. Generate an image with lora_scale = 0
  6. Generate an image with lora_scale = 1
  7. Generate an image with lora_scale = 1

The image generated in Step 6. is different from the image generated in Step 4. All the following images, if lora_scale is not changed again, remain consistent. Basically what happens is that somehow it takes one generation to get back on track and remain consistent. See plot attached:

image

We can see that Image 3 is different from Image 1, and from Image 4 on, as long as lora_scale don't change, it remains consistent.

This doesn't happen when not using compel and prompt embeddings:

image

Reproduction

I prepared a colab that shows the issue, accessible here: https://colab.research.google.com/drive/1ciFZPcvMsNZiZOpfHtLih5V6OyRh8Z6d?usp=sharing

System Info

diffusers[torch]==0.18.1 transformers==4.30.2 compel==1.2.1

pietrobolcato avatar Jul 13 '23 13:07 pietrobolcato

strange.

what i'm imagining is that the prompt= kwargs to the pipeline involve some kind of cleanup/init that you don't benefit from when passing prompt_embeds.

what happens if you take compel out of the equation but still use prompt_embeds? i.e. push the prompt through pipe.tokenizer then take the output of that and push it through pipe.text_encoder, and then pass that as prompt_embeds?

damian0815 avatar Jul 20 '23 19:07 damian0815

The lora scale value is provided at image generation time which isn't going to work for custom prompt embeds. Your image 2 with Compel is also wrong (still having text encoder weights scaled to 1.0 from the previous generation).

Adding this line before using Compel fixes the issue:

pipeline._lora_scale = lora_scale

pdoane avatar Jul 30 '23 01:07 pdoane

@pietrobolcato is this still an issue?

damian0815 avatar Aug 20 '23 15:08 damian0815

which isn't going to work for custom prompt embeds

if load multi lora like, self.pipe.load_lora_weights(adapter_id_pixel, adapter_name="pixel") self.pipe.load_lora_weights(adapter_id_chalkboardbrawing, adapter_name="chalkboardbrawing") self.pipe.set_adapters(["pixel", "chalkboardbrawing"], adapter_weights=[1.0, 1.0])

and then, do generate the image, sdout_image = self.pipe(prompt_embeds=prompt_embeds, pooled_prompt_embeds=pooled_prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, negative_pooled_prompt_embeds=negative_pooled_prompt_embeds, num_inference_steps=num_inference_steps, num_images_per_prompt=num_images_per_prompt, generator=generator, height=height, width=width, guidance_scale=guidance_scale, controlnet_conditioning_scale=controlnet_conditioning_scale, #controlnet_kwargs={"image": can_image}, cross_attention_kwargs={"scale": lora_scale}, control_guidance_start=control_guidance_start, control_guidance_end=control_guidance_end, clip_skip=2, image=can_image, ).images[0]

how to deal with it ? 
because there is multi lora

aiXia121 avatar Apr 26 '24 03:04 aiXia121

@damian0815 @pdoane

and ,one more question. The lora tagger's words in prompt and negative prompt's text-inversion embedding with tagger's word, how do the "Comple" affect these trigger words?And, i do some comparative experiment between the "stable diffusion webui" and the "diffusers inference", but the result is very bad in "diffusers inference" and the "stable diffusion webui" is normal and good .

thanks. looking forward to reply.

aiXia121 avatar Apr 26 '24 03:04 aiXia121

@damian0815 @pdoane

and ,one more question. The lora tagger's words in prompt and negative prompt's text-inversion embedding with tagger's word, how do the "Comple" affect these trigger words?And, i do some comparative experiment between the "stable diffusion webui" and the "diffusers inference", but the result is very bad in "diffusers inference" and the "stable diffusion webui" is normal and good .

thanks. looking forward to reply.

yes, i've found same issue and i think this is not fully related to compel without compel, the quality still degraded compare to sd-webui

crapthings avatar Apr 28 '24 00:04 crapthings

same problem.

@damian0815 @pdoane and ,one more question. The lora tagger's words in prompt and negative prompt's text-inversion embedding with tagger's word, how do the "Comple" affect these trigger words?And, i do some comparative experiment between the "stable diffusion webui" and the "diffusers inference", but the result is very bad in "diffusers inference" and the "stable diffusion webui" is normal and good . thanks. looking forward to reply.

yes, i've found same issue and i think this is not fully related to compel without compel, the quality still degraded compare to sd-webui same promble. yes, the diffuser's result is not good or better than the a111-webui.

aiXia121 avatar Apr 28 '24 09:04 aiXia121

Sure, diffusers is degraded compared to others, but every time I've tried it with a lora, compel seems to be heavily degraded vs diffusers with no compel.

markrmiller avatar Jun 02 '24 23:06 markrmiller