stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Prompt weights v2. Now with groups

Open amotile opened this issue 3 years ago • 57 comments

Like so: {a fire|an ice@3} dragon, {fantasy,sci-fi} art

As discussed in #972 we figured a syntax more like this would be more helpful.

Ex. a fire dragon, fantasy art fire

an ice dragon, fantasy art ice

{a fire|an ice} dragon, fantasy art fire_ice

{a fire|an ice@3} dragon, fantasy art ice3

also supports nested groups: {{a fire|an ice}|plasma} dragon, fantasy art gives 0.25 to fire and ice each and 0.5 to plasma plasma

also supports combination with scheduling [{a fire|an ice}:plasma:0.1] dragon, fantasy art image

Or the other way around: {a fire|[an ice:plasma:0.2]@3} dragon, fantasy art image

amotile avatar Sep 28 '22 22:09 amotile

It even leaves unmatched groups in place in case someone wants to use them for something "after" this parser

A {fire} dragon, {landscape,city} will stay "untouched".

amotile avatar Sep 28 '22 22:09 amotile

Using this syntax was not possible without changing the scheduler parser: {a fire|an ice:3}

If @AUTOMATIC1111 really want to do their original syntax I think that would be possible as well. {a fire|an ice(3)}

But it adds another character and I find it harder to read.

Another thought is that this version also does not take this into account regarding max prompt size. But hopefully it less of an issue then the other PR since you don't have to repeat as much of the prompt to get got results.

I did however make sure it did the prompts in bulk.

amotile avatar Sep 28 '22 22:09 amotile

❤️ Dragon examples.

dfaker avatar Sep 28 '22 22:09 dfaker

Great work! This must not have been easy. I can report that your solution produce the exact same result as what the other PR produced. Here is the actual image info to produce it:

{
portrait female commander shepard (amber heard), cyberpunk futuristic neon, hyper detailed, digital art, trending in artstation, cinematic lighting, studio quality, smooth render, unreal engine 5 rendered, octane render, Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face@1
|
ultra realistic style illustration of a cute red haired (amber heard), sci - fi, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, 8 k frostbite 3 engine, ultra detailed@2
}
Steps: 20, Sampler: Euler, CFG scale: 7, Seed: 1, Size: 512x512, Model hash: 7460a6fa

Bonus, there are no warnings generated.

bmaltais avatar Sep 28 '22 23:09 bmaltais

Considering I haven't coded any python before #930 , it was indeed a bit tricky.

amotile avatar Sep 28 '22 23:09 amotile

The fact that you can mix [from:to:weigh] with {prompt weight} is really cool... I am wondering how such a prompt get expanded as the steps progress:

[fantasy landscape, {{fire@4|lava@1}|ice@2}:real landscape:0.3]

I might have to add printf statements in the code to see it unravel... but those weights must be tricky.

This is like doing weighted prompts for just a portion of the steps then switching to full single prompt after 0.3 * steps... does it really do that? This is some crazy stuff.

bmaltais avatar Sep 28 '22 23:09 bmaltais

Really appreciate this PR, thank you!

#930 was a bit limited for me as I now have at least a dozen or two styles that utilize {prompt}

Using this PR, I am able to use prompt blending with my verbose styles without applying then creating/maintaining duplicate prompts with different subjects 😄

e.g. a fire dragon, fantasy art@1 an ice dragon, fantasy art@1 -> {a fire|an ice} dragon, fantasy art

evanjs avatar Sep 29 '22 00:09 evanjs

@amotile I tried to see the assigned weights for prompts and I noticed something odd. The last prompt weight for water does not look correct:

txt2img: [fantasy landscape, {{fire@4|lava@1}|ice@2|water@5}:real landscape:0.5]
 : 1.0
fantasy landscape, fire : 0.26666666666666666
fantasy landscape, lava : 0.06666666666666667
fantasy landscape, ice : 0.3333333333333333
fantasy landscape, water : 0.3333333333333333
real landscape : 1.0

and

txt2img: [fantasy landscape, {{fire@4|lava@1}|ice@2|water@4}:real landscape:0.5]
 : 1.0
fantasy landscape, fire : 0.26666666666666666
fantasy landscape, lava : 0.06666666666666667
fantasy landscape, ice : 0.3333333333333333
fantasy landscape, water : 0.3333333333333333
real landscape : 1.0

They both show as 0.3333 when it should clearly be different. I think it might be miscalculating the weight for the last prompt?

Also, if I use a float for a prompt weight it will not be interpreted properly and remain as part of the prompt:

txt2img: [fantasy landscape, {{fire@4|lava@1}|[email protected]|water@4}:real landscape:0.5]
 : 1.0
fantasy landscape, fire : 0.26666666666666666
fantasy landscape, lava : 0.06666666666666667
fantasy landscape, [email protected] : 0.3333333333333333
fantasy landscape, water : 0.3333333333333333
real landscape : 1.0

bmaltais avatar Sep 29 '22 01:09 bmaltais

Good catch, I think it works better now:

{{fire@4|lava@1}|ice@2|water@5}
[('fire', 0.1), ('lava', 0.025), ('ice', 0.25), ('water', 0.625)]
1.0

{{fire@4|lava@1}|ice@2|water@4}
[('fire', 0.1142857142857143), ('lava', 0.028571428571428574), ('ice', 0.2857142857142857), ('water', 0.5714285714285714)]
1.0

{{fire@4|lava@1}|[email protected]|water@4}
[('fire', 0.10666666666666667), ('lava', 0.02666666666666667), ('ice', 0.3333333333333333), ('water', 0.5333333333333333)]
1.0

amotile avatar Sep 29 '22 05:09 amotile

@amotile The updated code work as expected!

bmaltais avatar Sep 29 '22 14:09 bmaltais

I think the weights should be separated from the keywords by a colon. Automatic uses colons for new emphasis implementation. It would also match pre-existing prompt weight implementations.

shinkarom avatar Sep 29 '22 18:09 shinkarom

@shinkarom As I wrote above: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/1273#issuecomment-1261542619

This wasn't possible without also touching the [from:to:when] parser.

If that's the syntax we want I could add a preprocessing step that converts the syntax to the current one before that part.

But it would be good to know what the "deciders" think about the syntax. I know AUTOMATIC1111 had some thoughts https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/972#issuecomment-1256908923

amotile avatar Sep 29 '22 20:09 amotile

If that's the syntax we want I could add a preprocessing step that converts the syntax to the current one before that part.

I still think the best approach would be to implement a parser, have {} specify groups in general, and implement things like prompt blending or scheduling as operators/functions, e.g. {from->when->to->when2->to2} for an example.

Asmageddon avatar Sep 29 '22 21:09 Asmageddon

I think this PR is needed.

shinkarom avatar Oct 01 '22 17:10 shinkarom

I'm off the opinion that if you can do the same thing by running the system with different prompts it can be a custom script.

For things you can't do by external changes it should be baked into the prompt parser. Ex.

  • Scheduling [from:to:when] you can't get this effect without changing the way the system generates the images. Switching prompts midway.
  • ((Attention)), [[unattention]] also requires the generator to know about it.
  • Same with this PR. There's no way to get this effect by running the system many times with different prompts.

Things I don't think is needed in the prompt is:

  • Pick among random choices.
  • Generate all possible combination amongst choices
  • X/Y grids
  • Any parameters (but these are fine if tacked on to the end.) You can get the same effect by writing some loops in a script.

amotile avatar Oct 01 '22 19:10 amotile

@amotile My opinion is that picking random choices/generating combinations/XY grids could be easily implemented by making the script set $variables and including them in your prompt, effectively standardizing how a large amount of existing and future scripts is used.

If I wasn't afflicted by chronic fatigue for many years now, I'd gladly undertake writing a proper parser, as it is simple and a joy with Python's PEG libraries, and I believe it would prevent a lot of future headache. Everything that needs more than pattern substitution eventually needs a parser or starts accumulating tech debt.

Asmageddon avatar Oct 02 '22 09:10 Asmageddon

I'm sure a proper parser would be more scalable for the future, if though though and designed right. But writing such a thing is currently beyond my python skills. Also this PR fulfilled my personal needs of the system so I don't have the motivation to do anything further.

I do have a simple addition ready if we want to switch the syntax to use : however

 prompt_schedules = get_learned_conditioning_prompt_schedules(list(map(lambda p: switch_syntax(p), prompts)), steps)`
def switch_syntax(prompt):
    p = list(prompt)
    stack = []
    for i, c in enumerate(p):
        if c == '{' or c == '[' or c == '(':
            stack.append(c)

        if c == '}' or c == ']' or c == ')':
            stack.pop()

        if c == REAL_MARK and stack[-1] == '{':
            p[i] = MARK

    return "".join(p)

But I acknowledge this is a pretty hacky solution

amotile avatar Oct 02 '22 13:10 amotile

Now when the is back on the table.

  • Is there anything that needs to be fixed?
  • Should I add the switch of syntax? so it's {fire|ice:10} instead of {fire|ice@10}

amotile avatar Oct 03 '22 08:10 amotile

I htink having colons would be less confusing and more consistent.

shinkarom avatar Oct 03 '22 08:10 shinkarom

Personally I like the current syntax even though it is different. I feel like the colons kind of blend together in longer prompts that use all the different techniques and it's nice to be able to see prompt weighting separate from everything at a very quick glance, especially when using it directly within scheduling (or scheduling directly within weighted prompts). I feel like changing it would hurt readability in scenarios like that where you would have many colons for different methods nearly back-to-back.

zsinisi001 avatar Oct 03 '22 10:10 zsinisi001

With my preprocess solution both would actually work. Because it just converts the : into @ when in the correct context.

amotile avatar Oct 03 '22 11:10 amotile

Here's some of the use cases for this. https://www.reddit.com/r/StableDiffusion/comments/xv893c/under_the_mountain_prompt_blending_animation/

https://www.youtube.com/watch?v=0X3lOaRA0N4

I'm still learning how to use it but I'm still pretty happy with them.

amotile avatar Oct 04 '22 08:10 amotile

A short moment of hope that the new schedule parser wouldn't require the syntax switcher, but alas.

amotile avatar Oct 04 '22 18:10 amotile

Does anyone know how this new commit: https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/c26732fbee2a57e621ac22bf70decf7496daa4cd

relates to this PR? It seems to be doing something a bit more complicated... at least the code is touching way more places.

Are the results different from the original weighted prompts? Can you get the same results with AND as you can with what's in this PR?

I might try to debug the code and follow along what's happening later. But if someone knows and wants to do a quick explain I would be thankful.

amotile avatar Oct 06 '22 05:10 amotile

I think this commit allows only compositing in equal measures. While blending allows to select the proportions.

shinkarom avatar Oct 06 '22 07:10 shinkarom

The main difference is where the blending occurs - this mixes the text encoder outputs (77x768), and the new one mixes denoiser outputs (4xH/8xW/8). When I last tested, both approaches work (differently), though this one has the benefit of being essentially "free", whereas the other requires multiple passes of the denoiser.

A video I made when exploring the concepts. You can see how the two prompts fight and find a middle ground.

https://user-images.githubusercontent.com/24762404/194278988-36522a8b-e85c-4c21-9b4d-8434460453e7.mp4

Can't find the other video but the when encoder outputs are mixed instead, no such flickering is observed.

guaneec avatar Oct 06 '22 09:10 guaneec

Thanks, then there's still merit in this PR, I hope, since you can get different results. I will look into resolving the conflict as soon as I have time.

amotile avatar Oct 06 '22 09:10 amotile

To test, try mixing "Judy Hopps" and "real human young woman". Only blending gives the result I want. And there's a problem with compositing. If you prompt "a tree AND a bus near the tree", adding "low quality image" to negatives is enough to make the bus not appear.

shinkarom avatar Oct 06 '22 09:10 shinkarom

I managed to get this to work as a custom script. With some elbow grease it's possible to do quite a bit of hijacking of the code.

https://github.com/amotile/stable-diffusion-backend/blob/master/src/process/implementations/automatic1111_scripts/prompt_blending.py

amotile avatar Oct 08 '22 12:10 amotile

This sound awesome! Looking forward to try the custom script.

bmaltais avatar Oct 08 '22 13:10 bmaltais