stable-diffusion
stable-diffusion copied to clipboard
Add prompt weights to the txt2img script
I've been looking for a good code example to add prompt weights, but most implementations I found just split the prompt in subprompts, and make a weighted average of the embeddings of these subprompts. This has two disadvantages:
- Even adding a small weight tweak to one of the words changes the image completely
- Since the prompt is split, information about the context between words is lost. (which is exactly what transformers are good at)
This implementation takes a different approach; It calculates the difference between the whole prompt with and without the weighted subprompt. It takes this difference to be the contribution of the subprompt to the embedding vector, and uses this to subtly modify the original embedding vector on the weight. Results seem much more stable (small tweaks don't disturb the entire result), and I believe this to be a better method.
Tell me if I'm doing it wrong. :)
Testing this it seems to be working so far. Tested with a woman wearing (red:1.6) dress and (blue:1.1) skirt, octane render and a woman wearing (red:0.2) dress and (blue:1.5) skirt, octane render
Results below I was wondering if you might be able to make the weighting work with negative prompts - as provided in this PR? https://github.com/CompVis/stable-diffusion/pull/558
That would be the trifecta for my setup.
Although this does also work on img2img - if you set the values too high, it can destroy the image This was an octane render of a universe inside a crystal ball which was beautifully defined after multiple iterations of img2img and upscaling etc - until..... When using weights between 40-70% on phrases such as (clockwork details:1.4), (intricate detailed:1.7), (glowing crystal:1.5) Made it go mad and trash the image.
These weights are to be used with subtlety. I'd say anywhere between 0.5-1.3 is fine, but if too many subprompts are being pushed too much out of whack, it will basically push the embedding vector beyond what the model was trained on and corruption will occur.
Although this does also work on img2img - if you set the values too high, it can destroy the image This was an octane render of a universe inside a crystal ball which was beautifully defined after multiple iterations of img2img and upscaling etc - until..... When using weights between 40-70% on phrases such as (clockwork details:1.4), (intricate detailed:1.7), (glowing crystal:1.5) Made it go mad and trash the image.
Hey, Thanks for trying this out!
See my above message about not stretching these values too much. Other than that; Did you modify the img2img script? Bc. I only changed the txt2img script.
Although this does also work on img2img - if you set the values too high, it can destroy the image This was an octane render of a universe inside a crystal ball which was beautifully defined after multiple iterations of img2img and upscaling etc - until..... When using weights between 40-70% on phrases such as (clockwork details:1.4), (intricate detailed:1.7), (glowing crystal:1.5) Made it go mad and trash the image.
Hey, Thanks for trying this out!
See my above message about not stretching these values too much. Other than that; Did you modify the img2img script? Bc. I only changed the txt2img script.
Yes- I have copied your change to my txt2img img2img and txt2imghd scripts. It appears to work across all 3 scripts.
Trying to get it to work with negative prompts as well over here: https://github.com/tijszwinkels/stable-diffusion/tree/prompt-weight-negative
You can give it a whirl if you want. I might make a PR out of it later, but it doesn't really seem like PRs currently get merged anyway.
Can confirm the weighted prompts work well. I am trying to get it working on controlnet now - but so far not having much luck. Im not a python whiz it seems.
ControlNet uses a variation of the cond / uncond lines:
cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([combinedPrompt] * num_samples)]}
un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
```
I've tried using the get_learned_conditioning_with_prompt_weights
but it usually comes up with something like 'int is not subscriptable' or something or other.
Perhaps have a look here: https://github.com/hlky/nataili/blob/main/nataili/stable_diffusion/compvis.py#L881
The nataili lib uses prompt weights borrowed from this implementation (As a matter of fact; I wrote this implementation because I wanted prompt weights for stable horde, which uses this lib) - and it support controlnet as well now.
Can confirm the weighted prompts work well. I am trying to get it working on controlnet now - but so far not having much luck. Im not a python whiz it seems.
ControlNet uses a variation of the cond / uncond lines:
cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([combinedPrompt] * num_samples)]} un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]} ``` I've tried using the get_learned_conditioning_with_prompt_weights but it usually comes up with something like 'int is not subscriptable' or something or other.
BTW; using the get_learned_conditioning_with_prompt_weights on an array of prompts goes something like this:
c = torch.cat(
[
prompt_weights.get_learned_conditioning_with_prompt_weights(prompt, self.model)
for prompt in prompts
]
)
(with 'prompts' being the array).
Sorry to bug you again - still having trouble getting weighted prompts into my controlnet implementation. I tried copying over your classes - but ultimately it looked like there was some lower level changes in the encoder to accept clip_skip etc. It would keep bombing out with 'get_learned_conditioning' expects 2 arguments but was being called with 3. I wasn't able to find the custom encode method to update it to accept the extra argument - and I was worried about the impact it would have on the other scripts.
I tried copying the basic version from the img2img example - hoping it would at least get me part of the way- but I am still getting weird errors. No doubt due to my lack of experience in python.
Right now this is erroring out on get_learned_conditioning_with_prompt_weights - with the error below: Do I need to convert combinedPrompt to some special data type? Thats part I didn't have to worry about in img2img - since it seemed to be in the correct format already, but this script is a stripped down version of controlnets depth2img.py script - where it just accepts the prompt on command line args.
combinedPrompt = prompt + ', ' + a_prompt
print(combinedPrompt)
if isinstance(combinedPrompt, tuple):
combinedPrompt = list(combinedPrompt)
c = torch.cat(
[
get_learned_conditioning_with_prompt_weights(thePrompt, model)
for thePrompt in combinedPrompt
]
)
cond = {"c_concat": [control], "c_crossattn": [model.get_learned_conditioning([combinedPrompt] * num_samples)]}
un_cond = {"c_concat": None if guess_mode else [control], "c_crossattn": [model.get_learned_conditioning([n_prompt] * num_samples)]}
```
```
Traceback (most recent call last):
File "depth2image.py", line 247, in <module>
main()
File "depth2image.py", line 237, in main
detectedmap, result = process(opt.inputimage, opt.prompt, opt.added_prompt, opt.negative_prompt, 1, opt.resolution, opt.depthres, opt.steps, False, opt.controlstr, 9, opt.seed, 0.0, True, opt.model)
File "depth2image.py", line 120, in process
get_learned_conditioning_with_prompt_weights(combinedPrompt, model)
File "depth2image.py", line 45, in get_learned_conditioning_with_prompt_weights
filtered_whole_prompt_c = model.get_learned_conditioning(filtered_whole_prompt)
File "/home/baaleos/AIArtSetup/ControlNet/ldm/models/diffusion/ddpm.py", line 667, in get_learned_conditioning
c = self.cond_stage_model.encode(c)
File "/home/baaleos/AIArtSetup/ControlNet/ldm/modules/encoders/modules.py", line 131, in encode
return self(text)
File "/home/baaleos/miniconda3/envs/ldm/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/home/baaleos/AIArtSetup/ControlNet/cldm/hack.py", line 57, in _hacked_clip_forward
raw_tokens_123 = split(raw_tokens)
File "/home/baaleos/AIArtSetup/ControlNet/cldm/hack.py", line 48, in split
return x[75 * 0: 75 * 1], x[75 * 1: 75 * 2], x[75 * 2: 75 * 3]
TypeError: 'int' object is not subscriptable