stable-diffusion icon indicating copy to clipboard operation
stable-diffusion copied to clipboard

Add prompt weights to the txt2img script

Open tijszwinkels opened this issue 1 year ago • 10 comments

I've been looking for a good code example to add prompt weights, but most implementations I found just split the prompt in subprompts, and make a weighted average of the embeddings of these subprompts. This has two disadvantages:

  • Even adding a small weight tweak to one of the words changes the image completely
  • Since the prompt is split, information about the context between words is lost. (which is exactly what transformers are good at)

This implementation takes a different approach; It calculates the difference between the whole prompt with and without the weighted subprompt. It takes this difference to be the contribution of the subprompt to the embedding vector, and uses this to subtly modify the original embedding vector on the weight. Results seem much more stable (small tweaks don't disturb the entire result), and I believe this to be a better method.

Tell me if I'm doing it wrong. :)

tijszwinkels avatar Jan 16 '23 21:01 tijszwinkels

Testing this it seems to be working so far. Tested with a woman wearing (red:1.6) dress and (blue:1.1) skirt, octane render and a woman wearing (red:0.2) dress and (blue:1.5) skirt, octane render

Results below I was wondering if you might be able to make the weighting work with negative prompts - as provided in this PR? https://github.com/CompVis/stable-diffusion/pull/558

That would be the trifecta for my setup.

Red Image

BlueImage

Baaleos avatar Jan 17 '23 23:01 Baaleos

Although this does also work on img2img - if you set the values too high, it can destroy the image This was an octane render of a universe inside a crystal ball which was beautifully defined after multiple iterations of img2img and upscaling etc - until..... When using weights between 40-70% on phrases such as (clockwork details:1.4), (intricate detailed:1.7), (glowing crystal:1.5) Made it go mad and trash the image.

Destroyed image

Baaleos avatar Jan 17 '23 23:01 Baaleos

These weights are to be used with subtlety. I'd say anywhere between 0.5-1.3 is fine, but if too many subprompts are being pushed too much out of whack, it will basically push the embedding vector beyond what the model was trained on and corruption will occur.

tijszwinkels avatar Jan 18 '23 15:01 tijszwinkels

Although this does also work on img2img - if you set the values too high, it can destroy the image This was an octane render of a universe inside a crystal ball which was beautifully defined after multiple iterations of img2img and upscaling etc - until..... When using weights between 40-70% on phrases such as (clockwork details:1.4), (intricate detailed:1.7), (glowing crystal:1.5) Made it go mad and trash the image.

Destroyed image

Hey, Thanks for trying this out!

See my above message about not stretching these values too much. Other than that; Did you modify the img2img script? Bc. I only changed the txt2img script.

tijszwinkels avatar Jan 18 '23 15:01 tijszwinkels

Although this does also work on img2img - if you set the values too high, it can destroy the image This was an octane render of a universe inside a crystal ball which was beautifully defined after multiple iterations of img2img and upscaling etc - until..... When using weights between 40-70% on phrases such as (clockwork details:1.4), (intricate detailed:1.7), (glowing crystal:1.5) Made it go mad and trash the image. Destroyed image

Hey, Thanks for trying this out!

See my above message about not stretching these values too much. Other than that; Did you modify the img2img script? Bc. I only changed the txt2img script.

Yes- I have copied your change to my txt2img img2img and txt2imghd scripts. It appears to work across all 3 scripts.

Baaleos avatar Jan 18 '23 15:01 Baaleos

Trying to get it to work with negative prompts as well over here: https://github.com/tijszwinkels/stable-diffusion/tree/prompt-weight-negative

You can give it a whirl if you want. I might make a PR out of it later, but it doesn't really seem like PRs currently get merged anyway.

tijszwinkels avatar Jan 18 '23 21:01 tijszwinkels

Can confirm the weighted prompts work well. I am trying to get it working on controlnet now - but so far not having much luck. Im not a python whiz it seems.

ControlNet uses a variation of the cond / uncond lines:

    cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([combinedPrompt] * num_samples)]}
    un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
    ```
    
    I've tried using the get_learned_conditioning_with_prompt_weights
    
    but it usually comes up with something like 'int is not subscriptable' or something or other.
    

Baaleos avatar Feb 23 '23 13:02 Baaleos

Perhaps have a look here: https://github.com/hlky/nataili/blob/main/nataili/stable_diffusion/compvis.py#L881

The nataili lib uses prompt weights borrowed from this implementation (As a matter of fact; I wrote this implementation because I wanted prompt weights for stable horde, which uses this lib) - and it support controlnet as well now.

tijszwinkels avatar Feb 23 '23 15:02 tijszwinkels

Can confirm the weighted prompts work well. I am trying to get it working on controlnet now - but so far not having much luck. Im not a python whiz it seems.

ControlNet uses a variation of the cond / uncond lines:

    cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([combinedPrompt] * num_samples)]}
    un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
    ```
    
    I've tried using the get_learned_conditioning_with_prompt_weights
    
    but it usually comes up with something like 'int is not subscriptable' or something or other.
    

BTW; using the get_learned_conditioning_with_prompt_weights on an array of prompts goes something like this:

c = torch.cat(
  [
      prompt_weights.get_learned_conditioning_with_prompt_weights(prompt, self.model)
      for prompt in prompts
  ]
)

(with 'prompts' being the array).

tijszwinkels avatar Feb 23 '23 15:02 tijszwinkels

Sorry to bug you again - still having trouble getting weighted prompts into my controlnet implementation. I tried copying over your classes - but ultimately it looked like there was some lower level changes in the encoder to accept clip_skip etc. It would keep bombing out with 'get_learned_conditioning' expects 2 arguments but was being called with 3. I wasn't able to find the custom encode method to update it to accept the extra argument - and I was worried about the impact it would have on the other scripts.

I tried copying the basic version from the img2img example - hoping it would at least get me part of the way- but I am still getting weird errors. No doubt due to my lack of experience in python.

Right now this is erroring out on get_learned_conditioning_with_prompt_weights - with the error below: Do I need to convert combinedPrompt to some special data type? Thats part I didn't have to worry about in img2img - since it seemed to be in the correct format already, but this script is a stripped down version of controlnets depth2img.py script - where it just accepts the prompt on command line args.

combinedPrompt = prompt + ', ' + a_prompt
    print(combinedPrompt)
    if isinstance(combinedPrompt, tuple):
        combinedPrompt = list(combinedPrompt)

    c = torch.cat(
      [
          get_learned_conditioning_with_prompt_weights(thePrompt, model)
          for thePrompt in combinedPrompt
      ]
    )

    cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([combinedPrompt] * num_samples)]}
    un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
    ```
    
    ```
    Traceback (most recent call last):
  File "depth2image.py", line 247, in <module>
    main()
  File "depth2image.py", line 237, in main
    detectedmap, result = process(opt.inputimage, opt.prompt, opt.added_prompt, opt.negative_prompt, 1, opt.resolution, opt.depthres, opt.steps, False, opt.controlstr, 9, opt.seed, 0.0, True, opt.model)
  File "depth2image.py", line 120, in process
    get_learned_conditioning_with_prompt_weights(combinedPrompt, model)
  File "depth2image.py", line 45, in get_learned_conditioning_with_prompt_weights
    filtered_whole_prompt_c = model.get_learned_conditioning(filtered_whole_prompt)
  File "/home/baaleos/AIArtSetup/ControlNet/ldm/models/diffusion/ddpm.py", line 667, in get_learned_conditioning
    c = self.cond_stage_model.encode(c)
  File "/home/baaleos/AIArtSetup/ControlNet/ldm/modules/encoders/modules.py", line 131, in encode
    return self(text)
  File "/home/baaleos/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/baaleos/AIArtSetup/ControlNet/cldm/hack.py", line 57, in _hacked_clip_forward
    raw_tokens_123 = split(raw_tokens)
  File "/home/baaleos/AIArtSetup/ControlNet/cldm/hack.py", line 48, in split
    return x[75 * 0: 75 * 1], x[75 * 1: 75 * 2], x[75 * 2: 75 * 3]
TypeError: 'int' object is not subscriptable

Baaleos avatar Feb 24 '23 15:02 Baaleos