Prompt weights v2. Now with groups
Like so: {a fire|an ice@3} dragon, {fantasy,sci-fi} art
As discussed in #972 we figured a syntax more like this would be more helpful.
Ex.
a fire dragon, fantasy art

an ice dragon, fantasy art

{a fire|an ice} dragon, fantasy art

{a fire|an ice@3} dragon, fantasy art

also supports nested groups:
{{a fire|an ice}|plasma} dragon, fantasy art
gives 0.25 to fire and ice each and 0.5 to plasma

also supports combination with scheduling
[{a fire|an ice}:plasma:0.1] dragon, fantasy art

Or the other way around:
{a fire|[an ice:plasma:0.2]@3} dragon, fantasy art

It even leaves unmatched groups in place in case someone wants to use them for something "after" this parser
A {fire} dragon, {landscape,city} will stay "untouched".
Using this syntax was not possible without changing the scheduler parser:
{a fire|an ice:3}
If @AUTOMATIC1111 really want to do their original syntax I think that would be possible as well.
{a fire|an ice(3)}
But it adds another character and I find it harder to read.
Another thought is that this version also does not take this into account regarding max prompt size. But hopefully it less of an issue then the other PR since you don't have to repeat as much of the prompt to get got results.
I did however make sure it did the prompts in bulk.
❤️ Dragon examples.
Great work! This must not have been easy. I can report that your solution produce the exact same result as what the other PR produced. Here is the actual image info to produce it:
{
portrait female commander shepard (amber heard), cyberpunk futuristic neon, hyper detailed, digital art, trending in artstation, cinematic lighting, studio quality, smooth render, unreal engine 5 rendered, octane render, Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face@1
|
ultra realistic style illustration of a cute red haired (amber heard), sci - fi, fantasy, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, 8 k frostbite 3 engine, ultra detailed@2
}
Steps: 20, Sampler: Euler, CFG scale: 7, Seed: 1, Size: 512x512, Model hash: 7460a6fa
Bonus, there are no warnings generated.
Considering I haven't coded any python before #930 , it was indeed a bit tricky.
The fact that you can mix [from:to:weigh] with {prompt weight} is really cool... I am wondering how such a prompt get expanded as the steps progress:
[fantasy landscape, {{fire@4|lava@1}|ice@2}:real landscape:0.3]
I might have to add printf statements in the code to see it unravel... but those weights must be tricky.
This is like doing weighted prompts for just a portion of the steps then switching to full single prompt after 0.3 * steps... does it really do that? This is some crazy stuff.
Really appreciate this PR, thank you!
#930 was a bit limited for me as I now have at least a dozen or two styles that utilize {prompt}
Using this PR, I am able to use prompt blending with my verbose styles without applying then creating/maintaining duplicate prompts with different subjects 😄
e.g. a fire dragon, fantasy art@1 an ice dragon, fantasy art@1 -> {a fire|an ice} dragon, fantasy art
@amotile I tried to see the assigned weights for prompts and I noticed something odd. The last prompt weight for water does not look correct:
txt2img: [fantasy landscape, {{fire@4|lava@1}|ice@2|water@5}:real landscape:0.5]
: 1.0
fantasy landscape, fire : 0.26666666666666666
fantasy landscape, lava : 0.06666666666666667
fantasy landscape, ice : 0.3333333333333333
fantasy landscape, water : 0.3333333333333333
real landscape : 1.0
and
txt2img: [fantasy landscape, {{fire@4|lava@1}|ice@2|water@4}:real landscape:0.5]
: 1.0
fantasy landscape, fire : 0.26666666666666666
fantasy landscape, lava : 0.06666666666666667
fantasy landscape, ice : 0.3333333333333333
fantasy landscape, water : 0.3333333333333333
real landscape : 1.0
They both show as 0.3333 when it should clearly be different. I think it might be miscalculating the weight for the last prompt?
Also, if I use a float for a prompt weight it will not be interpreted properly and remain as part of the prompt:
txt2img: [fantasy landscape, {{fire@4|lava@1}|[email protected]|water@4}:real landscape:0.5]
: 1.0
fantasy landscape, fire : 0.26666666666666666
fantasy landscape, lava : 0.06666666666666667
fantasy landscape, [email protected] : 0.3333333333333333
fantasy landscape, water : 0.3333333333333333
real landscape : 1.0
Good catch, I think it works better now:
{{fire@4|lava@1}|ice@2|water@5}
[('fire', 0.1), ('lava', 0.025), ('ice', 0.25), ('water', 0.625)]
1.0
{{fire@4|lava@1}|ice@2|water@4}
[('fire', 0.1142857142857143), ('lava', 0.028571428571428574), ('ice', 0.2857142857142857), ('water', 0.5714285714285714)]
1.0
{{fire@4|lava@1}|[email protected]|water@4}
[('fire', 0.10666666666666667), ('lava', 0.02666666666666667), ('ice', 0.3333333333333333), ('water', 0.5333333333333333)]
1.0
@amotile The updated code work as expected!
I think the weights should be separated from the keywords by a colon. Automatic uses colons for new emphasis implementation. It would also match pre-existing prompt weight implementations.
@shinkarom As I wrote above: https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/1273#issuecomment-1261542619
This wasn't possible without also touching the [from:to:when] parser.
If that's the syntax we want I could add a preprocessing step that converts the syntax to the current one before that part.
But it would be good to know what the "deciders" think about the syntax. I know AUTOMATIC1111 had some thoughts https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/972#issuecomment-1256908923
If that's the syntax we want I could add a preprocessing step that converts the syntax to the current one before that part.
I still think the best approach would be to implement a parser, have {} specify groups in general, and implement things like prompt blending or scheduling as operators/functions, e.g. {from->when->to->when2->to2} for an example.
I think this PR is needed.
I'm off the opinion that if you can do the same thing by running the system with different prompts it can be a custom script.
For things you can't do by external changes it should be baked into the prompt parser. Ex.
- Scheduling [from:to:when] you can't get this effect without changing the way the system generates the images. Switching prompts midway.
- ((Attention)), [[unattention]] also requires the generator to know about it.
- Same with this PR. There's no way to get this effect by running the system many times with different prompts.
Things I don't think is needed in the prompt is:
- Pick among random choices.
- Generate all possible combination amongst choices
- X/Y grids
- Any parameters (but these are fine if tacked on to the end.) You can get the same effect by writing some loops in a script.
@amotile My opinion is that picking random choices/generating combinations/XY grids could be easily implemented by making the script set $variables and including them in your prompt, effectively standardizing how a large amount of existing and future scripts is used.
If I wasn't afflicted by chronic fatigue for many years now, I'd gladly undertake writing a proper parser, as it is simple and a joy with Python's PEG libraries, and I believe it would prevent a lot of future headache. Everything that needs more than pattern substitution eventually needs a parser or starts accumulating tech debt.
I'm sure a proper parser would be more scalable for the future, if though though and designed right. But writing such a thing is currently beyond my python skills. Also this PR fulfilled my personal needs of the system so I don't have the motivation to do anything further.
I do have a simple addition ready if we want to switch the syntax to use : however
prompt_schedules = get_learned_conditioning_prompt_schedules(list(map(lambda p: switch_syntax(p), prompts)), steps)`
def switch_syntax(prompt):
p = list(prompt)
stack = []
for i, c in enumerate(p):
if c == '{' or c == '[' or c == '(':
stack.append(c)
if c == '}' or c == ']' or c == ')':
stack.pop()
if c == REAL_MARK and stack[-1] == '{':
p[i] = MARK
return "".join(p)
But I acknowledge this is a pretty hacky solution
Now when the is back on the table.
- Is there anything that needs to be fixed?
- Should I add the switch of syntax? so it's {fire|ice:10} instead of {fire|ice@10}
I htink having colons would be less confusing and more consistent.
Personally I like the current syntax even though it is different. I feel like the colons kind of blend together in longer prompts that use all the different techniques and it's nice to be able to see prompt weighting separate from everything at a very quick glance, especially when using it directly within scheduling (or scheduling directly within weighted prompts). I feel like changing it would hurt readability in scenarios like that where you would have many colons for different methods nearly back-to-back.
With my preprocess solution both would actually work. Because it just converts the : into @ when in the correct context.
Here's some of the use cases for this. https://www.reddit.com/r/StableDiffusion/comments/xv893c/under_the_mountain_prompt_blending_animation/
https://www.youtube.com/watch?v=0X3lOaRA0N4
I'm still learning how to use it but I'm still pretty happy with them.
A short moment of hope that the new schedule parser wouldn't require the syntax switcher, but alas.
Does anyone know how this new commit: https://github.com/AUTOMATIC1111/stable-diffusion-webui/commit/c26732fbee2a57e621ac22bf70decf7496daa4cd
relates to this PR? It seems to be doing something a bit more complicated... at least the code is touching way more places.
Are the results different from the original weighted prompts? Can you get the same results with AND as you can with what's in this PR?
I might try to debug the code and follow along what's happening later. But if someone knows and wants to do a quick explain I would be thankful.
I think this commit allows only compositing in equal measures. While blending allows to select the proportions.
The main difference is where the blending occurs - this mixes the text encoder outputs (77x768), and the new one mixes denoiser outputs (4xH/8xW/8). When I last tested, both approaches work (differently), though this one has the benefit of being essentially "free", whereas the other requires multiple passes of the denoiser.
A video I made when exploring the concepts. You can see how the two prompts fight and find a middle ground.
https://user-images.githubusercontent.com/24762404/194278988-36522a8b-e85c-4c21-9b4d-8434460453e7.mp4
Can't find the other video but the when encoder outputs are mixed instead, no such flickering is observed.
Thanks, then there's still merit in this PR, I hope, since you can get different results. I will look into resolving the conflict as soon as I have time.
To test, try mixing "Judy Hopps" and "real human young woman". Only blending gives the result I want. And there's a problem with compositing. If you prompt "a tree AND a bus near the tree", adding "low quality image" to negatives is enough to make the bus not appear.
I managed to get this to work as a custom script. With some elbow grease it's possible to do quite a bit of hijacking of the code.
https://github.com/amotile/stable-diffusion-backend/blob/master/src/process/implementations/automatic1111_scripts/prompt_blending.py
This sound awesome! Looking forward to try the custom script.