ComfyUI icon indicating copy to clipboard operation
ComfyUI copied to clipboard

Alternating|Tokens and From:To:When

Open HaydenReeve opened this issue 2 years ago • 39 comments

Hi!

I'd love to enquire about the ability to use two really powerful features from other platforms.

  • Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified.
  • Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.

Thanks!

HaydenReeve avatar Jun 03 '23 17:06 HaydenReeve

Hi!

I'd love to enquire about the ability to use two really powerful features from other platforms.

* Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified.

* Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.

Thanks!

  • In regards to the first one, that's achieved with {Human|Duck} like with sd-dynamic-prompts, and disco diffusion in the past.
    • In my WAS Node Suite you can also use <Human|Duck> (though {Human|Duck} also works) with CLIPTextEncode (NSP) and if you have Advanced CLIPTextEncode Node there will be another conditioning node for that as well with same features. This allows you to do reproducible dynamic prompts. A bonus to these nodes as well is you can create variables, like $|Human Entity with Red Eyes|$ and then elsewhere in the prompt you can use $1 to print that same text again. Subsequent variables are accessed according to occurrence, so second would be $2 and so on.
  • I thought to:from:when worked but maybe not? I know [Photorealistic:0.5] works.

WASasquatch avatar Jun 04 '23 02:06 WASasquatch

@WASasquatch

RE: {Human|Duck}

The documentation in the README.md lists this

You can use {day|night}, for wildcard/dynamic prompts. With this syntax {wild|card|test} will be randomly replaced by either "wild", "card" or "test" by the frontend every time you queue the prompt. To use {} characters in your actual prompt escape them like: \{ or \}.

This is slightly different functionality from what I am referencing here.

This alternates each time a prompt is queued, not each step of the latent diffusion. The functionality I am describing, at each step, would produce a Human Duck hybrid... Thing. The current functionality of {Human|Duck} would generate either a Human or a Duck.

RE: To:From:When, this one most certainly doesn't work as expected.

This prompt does not appear to work. Man, [Apple:Fire:0.8] produces 2023-06-11_19-12-14_00001_ While this prompt Man, [Apple::0.8] produces 2023-06-11_19-12-37_00001_

I tried [Duck|Man:0.2] and [Duck|Man:0.8] 2023-06-11_19-5-15_00001_ 2023-06-11_19-5-32_00001_

Finally, the [Photorealistic:0.5] example also wouldn't work as expected. I tested using an obvious prompt, Neon. Here are Man, , Man, [Neon:0.9], and Man, [Neon:0.1] 2023-06-11_19-16-55_00001_ 2023-06-11_19-17-30_00001_ 2023-06-11_19-17-46_00001_

While they do appear to have an effect on the image, they don't work as a sequencer or as a blend method.

HaydenReeve avatar Jun 11 '23 11:06 HaydenReeve

Oh to do it for a single diffusion run you have to use stop/start steps with multiple samplers. As for second one, that sucks. Though this works similarly for me ([apple:0.5] [fire:0.5]:1.1) where both only occur for half the steps.

WASasquatch avatar Jun 11 '23 16:06 WASasquatch

I assume you're talking about multiple-sampler retaining noise between each? I cannot imagine doing that in an easy fashion. For a 20 step prompt you'd have to have twenty samplers.

I'll have a look at ([apple:0.5] [fire:0.5]:1.1) but to me that reads as two words; apple and fire, decreased by 50% weighting and then multiplied by 10%. That doesn't exactly seem to be how it works from playing around, but I've not found another explanation that seems to work as expected.

Most surprisingly I've noticed a lot of things that seem to work, but when trying to reduce or test the effects the hypothesis falls apart.

The two features of {Cat|Dog} and {Abstract|Photorealistic:0.8} are stuff that A1111 does natively, which I was wondering whether they'd be officially supported at some point by Comfy.

HaydenReeve avatar Jun 12 '23 01:06 HaydenReeve

Brackets isn't decreased weight, it's decreased steps as far as I am aware. Lowering weight is with parenthesis and just using low weight. Brackets control it's occurrence in the diffusion. So 0.5 would be 50% of the steps, so 10 steps.

The issue with ComfyUI is we encode text early to do stuff with it. Combine, mix, etc, to them input into a sampler already encoded. A1111 has text that it encodes on the fly at diffusion time. So each diffusion step it could parse the text differently.

WASasquatch avatar Jun 12 '23 16:06 WASasquatch

yeah both features are so powerful...if there is a way to implement them in ComfyUI, that would be great.

taabata avatar Jun 13 '23 05:06 taabata

Brackets isn't decreased weight, it's decreased steps as far as I am aware. Lowering weight is with parenthesis and just using low weight. Brackets control it's occurrence in the diffusion. So 0.5 would be 50% of the steps, so 10 steps.

That doesn't make sense considering the Man, [Neon:0.X] prompt above. the Man, [Neon:0.1] should have 10% of the Neon token but it's virtually identical to the Man, [Neon:0.9] diffusion.

I suppose that equally means it doesn't lower the weighting either.

I imagine, realistically, the tokens do nothing special but because order/spaces/special characters can be so important that the image is altered slightly and it gives the impression of having an effect.

HaydenReeve avatar Jun 13 '23 05:06 HaydenReeve

That doesn't make sense considering the Man, [Neon:0.X] prompt above. the Man, [Neon:0.1] should have 10% of the Neon token but it's virtually identical to the Man, [Neon:0.9] diffusion.

To me, the "neon" sign literal portion is being omitted while the colors derived from the initial noise by prompt are the same. Notice int he 0.1 there is less defined "actual" lights, and the signs neon lettering near gone. This is also why I demonstrated weighting the bracketed bit like man, ([neon:0.1]:1.2) or something to taste.

It may not work as I think it does, but from my testing it seems to, but being just arbitrary without actual defined step control it's very random and hard to control from seed to seed / prompt to prompt.

WASasquatch avatar Jun 13 '23 07:06 WASasquatch

That's a reasonable assertion.

We can test this, I believe, quite easily.

Take the following prompt, Man, Black and White, 2023-06-13_21-27-25_00001_

Now let's add Pink, [Pink:0.9], and [Pink:0.1] to that prompt respectively. 2023-06-13_21-31-4_00001_ 2023-06-13_21-31-39_00001_ 2023-06-13_21-31-44_00001_

Let's also inspect [Pink:1] and [Pink:1.0] 2023-06-13_21-32-18_00001_ 2023-06-13_21-32-46_00001_

At least with the Pink token, I don't believe this assertion holds up. The rather eclectic results are quite interesting—and should [Pink:0.1] be proven to be a token of 10% of the total steps, I'm not sure it functions how you would expect it to function at the very least.

All of these tests were performed with 10 steps, Euler Karras. Which means 10% would be precisely one step.

HaydenReeve avatar Jun 13 '23 13:06 HaydenReeve

I am wondering if there would be value in a custom node that does prepares the prompts but does not encode them until a later node, decoupling these functionalities from the node itself. That could allow for the encoding to occur on or before the sampler in custom nodes or officially supported.

I do believe I've found myself wishing more than once to be able to easily append tokens to the prompt without combining the encodings at a later stage for prompt morphing during two/three-resampling stages.

HaydenReeve avatar Jun 13 '23 13:06 HaydenReeve

I am wondering if there would be value in a custom node that does prepares the prompts but does not encode them until a later node, decoupling these functionalities from the node itself. That could allow for the encoding to occur on or before the sampler in custom nodes or officially supported.

I do believe I've found myself wishing more than once to be able to easily append tokens to the prompt without combining the encodings at a later stage for prompt morphing during two/three-resampling stages.

WAS-NS has text editing nodes to setup prompts before putting through a Text to Conditioning or other Conditioning node. There is also Tokens to save your own custom stuff. The NSP condition nodes under WAS Suite/Conditioning allow you to use <one|two|three> random prompts which will be reproducible by conditioning seed. It also has prompt variable, so you could do stuff like $|__color__|$_lights, $1_sign and it would be parsed to something like red_lights, red_sign.

As far as true to:from:where, I think we need @comfyanonymous to confirm how to do, or if it even exists.

WASasquatch avatar Jun 13 '23 17:06 WASasquatch

Also here is my test

A man image

A man, [pink:0.001] image

A man, pink image

A man, (pink:-1.0) image

To me, it still seems using brackets creates a less defined effect. Pink isn't used for specific things but just in the init_noise, which tricks up the resulting image to incorporate it however, but not defined.

In fact it seems using the brackets brings down the whole fidelity of the image, which may be related to skipping?

Another reason I think it works at least similar is because stuff like this is possible, where with regular weighting is harder to control

A man, ([pink_hair:0.5] mixed with [purple_hair:0.5]:1.2) image

A man, pink_hair mixed with purple_hair image

When just prompting it, or weighting up one or the other, it seems one color is just more dominant then the other and hard to get a good mix through tons of gens.

WASasquatch avatar Jun 13 '23 17:06 WASasquatch

Hi!

I'd love to enquire about the ability to use two really powerful features from other platforms.

  • Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified.
  • Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.

Thanks!

Check out my custom node i created. https://github.com/taabata/Comfy_custom_nodes Screenshot from 2023-06-17 17-13-32

taabata avatar Jun 17 '23 09:06 taabata

I've encountered this exact problem.

And I've just developed a solution for that. The idea is to create a KSamplerAdvanced node for each step. Then use a custom CLIPTextEncodeA1111 node before it that converts A1111-prompt to standard prompt. Then use a textbox to feed the A1111-like prompt to all of the CLIPTextEncodeA1111.

Unlike the solution of @taabata, my solution has the potential to support controlnet. However, my solution is messy and requires a lot of nodes (which can be automatically generated using a script included in my repo). The syntax is slightly different form A1111 tho because I don't want to use the : as the same character is also used for embedding in ComfyUI. My solution also support recursion syntax.

Here's the repo: https://github.com/SadaleNet/CLIPTextEncodeA1111-ComfyUI

image

SadaleNet avatar Jul 30 '23 15:07 SadaleNet

I've encountered this exact problem.

And I've just developed a solution for that. The idea is to create a KSamplerAdvanced node for each step. Then use a custom CLIPTextEncodeA1111 node before it that converts A1111-prompt to standard prompt. Then use a textbox to feed the A1111-like prompt to all of the CLIPTextEncodeA1111.

Unlike the solution of @taabata, my solution has the potential to support controlnet. However, my solution is messy and requires a lot of nodes (which can be automatically generated using a script included in my repo). The syntax is slightly different form A1111 tho because I don't want to use the : as the same character is also used for embedding in ComfyUI. My solution also support recursion syntax.

Here's the repo: https://github.com/SadaleNet/CLIPTextEncodeA1111-ComfyUI

image

Recently someone implemented this. Try this. https://github.com/asagi4/comfyui-prompt-control

ltdrdata avatar Jul 30 '23 17:07 ltdrdata

Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.

You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.

coreyryanhanson avatar Jul 31 '23 03:07 coreyryanhanson

Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.

You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.

I agree with this logic. Being able to step... the step... would allow you do this elegantly with ksampler advanced. @comfyanonymous does this seem logical?

WASasquatch avatar Jul 31 '23 04:07 WASasquatch

Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.

You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.

I don't think this idea would work. It'd require the latent output and latent input of the two KSamplerAdvanced nodes to connect with each others.

SadaleNet avatar Jul 31 '23 13:07 SadaleNet

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

MoonRide303 avatar Aug 04 '23 16:08 MoonRide303

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

Now ComfyUI supports, ConditioningSetTimestepRange.

ltdrdata avatar Aug 05 '23 00:08 ltdrdata

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

Now ComfyUI supports, ConditioningSetTimestepRange.

Is there an example of how to do this with that? I wasn't getting same sort of results, but I am not exactly sure how to use it, just what seems like how to do it.

WASasquatch avatar Aug 05 '23 07:08 WASasquatch

+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.

Now ComfyUI supports, ConditioningSetTimestepRange.

The thing is that for more complex prompts and multiple prompts / CLIP encoders setup we'd be quickly flooded with nodes. Sample (and still relatively simple) prompt from A1111:

[dslr photography : oil on canvas painting : 0.1] of a [blue | red] sphere in the city, [dark ink : airbrush : 0.25], dark cyberpunk future, high quality, high resolution
Negative prompt: low quality, low resolution
Steps: 30, Sampler: Euler, CFG scale: 7, Seed: 0, Size: 1024x1024, Model hash: e6bb9ea85b, Model: sd_xl_base_1.0_0.9vae, Clip skip: 2, Score: 7.19, Version: v1.5.1

and the output: image

It's very easy and fun to make that kind of transitions in A1111, and it works pretty well.

Doing something like that via extra nodes would basically mean that for every unique combination of the prompt we would have to create duplicates of prompt and conditioning nodes.

And imagine doing it with more advanced flows - for example my basic setup for SDXL is 3 positive + 3 negative prompts (one for each text encoder: base G+, base G-, base L+, base L-, refiner+, refiner-). If I wanted to do transitions like in the example above in the ComfyUI, I would have to make few times more nodes just to handle that prompt. And each time I would like to add or remove some transitions in the prompt, I would have to reconfigure whole flow.

The prompt2prompt way looks like much better idea to me, to be honest. If anyone would like to (and/or knows how to) implement it in ComfyUI, here is original implementation of this feature from Doggettx, and here is v2 (might be useful as reference). It would probably work best if it was included in the basic ComfyUI functionality (not as custom nodes).

MoonRide303 avatar Aug 05 '23 09:08 MoonRide303

Is there an example of how to do this with that? I wasn't getting same sort of results, but I am not exactly sure how to use it, just what seems like how to do it.

For the from:when, you would set the start and end for both prompts and then pipe them into a Conditioning (Combine)

coreyryanhanson avatar Aug 05 '23 14:08 coreyryanhanson

The custom node i created allows for token alternating and prompt editing with control net as well. link: https://github.com/taabata/Comfy_Syrian_Falcon_Nodes/tree/main

Screenshot from 2023-08-06 09-25-42

taabata avatar Aug 06 '23 01:08 taabata

I'm late at the party, but +1 for the request.

Now ComfyUI supports, ConditioningSetTimestepRange.

@ltdrdata if I get it right, this node can be used as an alternative to [from:to:when] syntax. But:

  • It still requires us to manually split text prompt into pieces. What if a prompt contains multiple such entries, each using it's own switch point? This can quickly require literal dozens of nodes just for that.
  • As far as I can see, there's still no alternative to [cow|horse] syntax. Which is usually used with multiple entries, too. This prompt: [grey|white|brown] [cow|horse] on a [grass|field|courtyard|lawn|glade] immediately creates 3*2*5= 30 prompt variants. Which currently can be achieved in ComfyUI only with 30 text node copies and an INSANELY intertwined graph.

Worst of all, both solutions make a network prompt-dependent.

So... is it planned to implement an actual equivalent for this syntax?

Lex-DRL avatar Aug 11 '23 13:08 Lex-DRL

I'm late at the party, but +1 for the request.

Now ComfyUI supports, ConditioningSetTimestepRange.

@ltdrdata if I get it right, this node can be used as an alternative to [from:to:when] syntax. But:

  • It still requires us to manually split text prompt into pieces. What if a prompt contains multiple such entries, each using it's own switch point? This can quickly require literal dozens of nodes just for that.
  • As far as I can see, there's still no alternative to [cow|horse] syntax. Which is usually used with multiple entries, too. This prompt: [grey|white|brown] [cow|horse] on a [grass|field|courtyard|lawn|glade] immediately creates 3*2*5= 30 prompt variants. Which currently can be achieved only with 30 text node copies and an INSANELY intertwined graph.

Worst of all, both solutions make a network prompt-dependent.

So... is it planned to implement an actual equivalent for this syntax?

Yeah. It seems that we need develope the wrapper.

ltdrdata avatar Aug 11 '23 13:08 ltdrdata

@ltdrdata Maybe, you don't need an edge-case wrapper. Maybe, you need an extension to the current data type + an upgrade to the currently present nodes.

Sorry if what I'm going to suggest doesn't make sense to you (if it is, disregard this comment): I'm not sure about the specific python implementation of data flow in ComfyUI. But maybe, instead of a new uber-all-in-one node, what we need is something like conditioning v2 data type (between nodes). Which is treated not as a single data instance, but as an iterator handle of such data.

  • I assume, current conditioning connection passes data through only once, at evaluation start. Unlike it, dependent nodes connected with conditioning v2 would request the data instance at each step.
  • it's the source node's resposibility what it outputs. It may output the same conditioning at each step, but may generate different ones.
  • if a current (legacy) datatype is connected to a node with the newer version input, it's just automatically converted into an infinite iterator of the same thing
  • To let dependent nodes do any work only once, there could be some metadata attached to indicate the number of unique conditioning objects it generates, their IDs, etc.

Lex-DRL avatar Aug 11 '23 13:08 Lex-DRL

I'm also late to the party too and I will also +1 this request. I tried the custom nodes presented in this thread. Sadly @taabata 's one is not working for me. I got different errors that I solved before hitting one that I did not understand. @SadaleNet 's one works well on my machine but is not scalable.

Prompt alternating is a great way to achieve some effects that are hard to obtain in a different way.

tbrebant avatar Aug 17 '23 00:08 tbrebant

Conditioning concat or combine should give results that are close to alternating prompts.

comfyanonymous avatar Aug 17 '23 00:08 comfyanonymous

The issue is, it makes the graph unmanageable. To do it with conditioning concat, we need to manually split a single prompt into multiple nodes... and the split point usually moves within the prompt, which makes the prompting process unnecessarily overcomplicated.

Lex-DRL avatar Aug 17 '23 01:08 Lex-DRL