ComfyUI
ComfyUI copied to clipboard
Alternating|Tokens and From:To:When
Hi!
I'd love to enquire about the ability to use two really powerful features from other platforms.
- Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified.
- Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.
Thanks!
Hi!
I'd love to enquire about the ability to use two really powerful features from other platforms.
* Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified. * Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.Thanks!
- In regards to the first one, that's achieved with
{Human|Duck}like with sd-dynamic-prompts, and disco diffusion in the past.- In my WAS Node Suite you can also use
<Human|Duck>(though{Human|Duck}also works) with CLIPTextEncode (NSP) and if you have Advanced CLIPTextEncode Node there will be another conditioning node for that as well with same features. This allows you to do reproducible dynamic prompts. A bonus to these nodes as well is you can create variables, like$|Human Entity with Red Eyes|$and then elsewhere in the prompt you can use$1to print that same text again. Subsequent variables are accessed according to occurrence, so second would be$2and so on.
- In my WAS Node Suite you can also use
- I thought to:from:when worked but maybe not? I know
[Photorealistic:0.5]works.
@WASasquatch
RE: {Human|Duck}
The documentation in the README.md lists this
You can use
{day|night}, for wildcard/dynamic prompts. With this syntax{wild|card|test}will be randomly replaced by either "wild", "card" or "test" by the frontend every time you queue the prompt. To use {} characters in your actual prompt escape them like:\{or\}.
This is slightly different functionality from what I am referencing here.
This alternates each time a prompt is queued, not each step of the latent diffusion. The functionality I am describing, at each step, would produce a Human Duck hybrid... Thing. The current functionality of {Human|Duck} would generate either a Human or a Duck.
RE: To:From:When, this one most certainly doesn't work as expected.
This prompt does not appear to work.
Man, [Apple:Fire:0.8] produces
While this prompt
Man, [Apple::0.8] produces
I tried [Duck|Man:0.2] and [Duck|Man:0.8]
Finally, the [Photorealistic:0.5] example also wouldn't work as expected. I tested using an obvious prompt, Neon. Here are Man, , Man, [Neon:0.9], and Man, [Neon:0.1]
While they do appear to have an effect on the image, they don't work as a sequencer or as a blend method.
Oh to do it for a single diffusion run you have to use stop/start steps with multiple samplers. As for second one, that sucks. Though this works similarly for me ([apple:0.5] [fire:0.5]:1.1) where both only occur for half the steps.
I assume you're talking about multiple-sampler retaining noise between each? I cannot imagine doing that in an easy fashion. For a 20 step prompt you'd have to have twenty samplers.
I'll have a look at ([apple:0.5] [fire:0.5]:1.1) but to me that reads as two words; apple and fire, decreased by 50% weighting and then multiplied by 10%. That doesn't exactly seem to be how it works from playing around, but I've not found another explanation that seems to work as expected.
Most surprisingly I've noticed a lot of things that seem to work, but when trying to reduce or test the effects the hypothesis falls apart.
The two features of {Cat|Dog} and {Abstract|Photorealistic:0.8} are stuff that A1111 does natively, which I was wondering whether they'd be officially supported at some point by Comfy.
Brackets isn't decreased weight, it's decreased steps as far as I am aware. Lowering weight is with parenthesis and just using low weight. Brackets control it's occurrence in the diffusion. So 0.5 would be 50% of the steps, so 10 steps.
The issue with ComfyUI is we encode text early to do stuff with it. Combine, mix, etc, to them input into a sampler already encoded. A1111 has text that it encodes on the fly at diffusion time. So each diffusion step it could parse the text differently.
yeah both features are so powerful...if there is a way to implement them in ComfyUI, that would be great.
Brackets isn't decreased weight, it's decreased steps as far as I am aware. Lowering weight is with parenthesis and just using low weight. Brackets control it's occurrence in the diffusion. So 0.5 would be 50% of the steps, so 10 steps.
That doesn't make sense considering the Man, [Neon:0.X] prompt above. the Man, [Neon:0.1] should have 10% of the Neon token but it's virtually identical to the Man, [Neon:0.9] diffusion.
I suppose that equally means it doesn't lower the weighting either.
I imagine, realistically, the tokens do nothing special but because order/spaces/special characters can be so important that the image is altered slightly and it gives the impression of having an effect.
That doesn't make sense considering the
Man, [Neon:0.X]prompt above. theMan, [Neon:0.1]should have 10% of the Neon token but it's virtually identical to theMan, [Neon:0.9]diffusion.
To me, the "neon" sign literal portion is being omitted while the colors derived from the initial noise by prompt are the same. Notice int he 0.1 there is less defined "actual" lights, and the signs neon lettering near gone. This is also why I demonstrated weighting the bracketed bit like man, ([neon:0.1]:1.2) or something to taste.
It may not work as I think it does, but from my testing it seems to, but being just arbitrary without actual defined step control it's very random and hard to control from seed to seed / prompt to prompt.
That's a reasonable assertion.
We can test this, I believe, quite easily.
Take the following prompt, Man, Black and White,
Now let's add Pink, [Pink:0.9], and [Pink:0.1] to that prompt respectively.
Let's also inspect [Pink:1] and [Pink:1.0]
At least with the Pink token, I don't believe this assertion holds up. The rather eclectic results are quite interesting—and should [Pink:0.1] be proven to be a token of 10% of the total steps, I'm not sure it functions how you would expect it to function at the very least.
All of these tests were performed with 10 steps, Euler Karras. Which means 10% would be precisely one step.
I am wondering if there would be value in a custom node that does prepares the prompts but does not encode them until a later node, decoupling these functionalities from the node itself. That could allow for the encoding to occur on or before the sampler in custom nodes or officially supported.
I do believe I've found myself wishing more than once to be able to easily append tokens to the prompt without combining the encodings at a later stage for prompt morphing during two/three-resampling stages.
I am wondering if there would be value in a custom node that does prepares the prompts but does not encode them until a later node, decoupling these functionalities from the node itself. That could allow for the encoding to occur on or before the sampler in custom nodes or officially supported.
I do believe I've found myself wishing more than once to be able to easily append tokens to the prompt without combining the encodings at a later stage for prompt morphing during two/three-resampling stages.
WAS-NS has text editing nodes to setup prompts before putting through a Text to Conditioning or other Conditioning node. There is also Tokens to save your own custom stuff. The NSP condition nodes under WAS Suite/Conditioning allow you to use <one|two|three> random prompts which will be reproducible by conditioning seed. It also has prompt variable, so you could do stuff like $|__color__|$_lights, $1_sign and it would be parsed to something like red_lights, red_sign.
As far as true to:from:where, I think we need @comfyanonymous to confirm how to do, or if it even exists.
Also here is my test
A man
A man, [pink:0.001]
A man, pink
A man, (pink:-1.0)
To me, it still seems using brackets creates a less defined effect. Pink isn't used for specific things but just in the init_noise, which tricks up the resulting image to incorporate it however, but not defined.
In fact it seems using the brackets brings down the whole fidelity of the image, which may be related to skipping?
Another reason I think it works at least similar is because stuff like this is possible, where with regular weighting is harder to control
A man, ([pink_hair:0.5] mixed with [purple_hair:0.5]:1.2)
A man, pink_hair mixed with purple_hair
When just prompting it, or weighting up one or the other, it seems one color is just more dominant then the other and hard to get a good mix through tons of gens.
Hi!
I'd love to enquire about the ability to use two really powerful features from other platforms.
- Prompt Alternating where you have [Human|Duck] and each step it iterates between the token specified.
- Prompt Editing, where it changes based on how many steps completed such as [Photorealistic:Abstract:0.5] where half way through it will change artistic styles.
Thanks!
Check out my custom node i created.
https://github.com/taabata/Comfy_custom_nodes
I've encountered this exact problem.
And I've just developed a solution for that. The idea is to create a KSamplerAdvanced node for each step. Then use a custom CLIPTextEncodeA1111 node before it that converts A1111-prompt to standard prompt. Then use a textbox to feed the A1111-like prompt to all of the CLIPTextEncodeA1111.
Unlike the solution of @taabata, my solution has the potential to support controlnet. However, my solution is messy and requires a lot of nodes (which can be automatically generated using a script included in my repo). The syntax is slightly different form A1111 tho because I don't want to use the : as the same character is also used for embedding in ComfyUI. My solution also support recursion syntax.
Here's the repo: https://github.com/SadaleNet/CLIPTextEncodeA1111-ComfyUI
I've encountered this exact problem.
And I've just developed a solution for that. The idea is to create a
KSamplerAdvancednode for each step. Then use a customCLIPTextEncodeA1111node before it that converts A1111-prompt to standard prompt. Then use a textbox to feed the A1111-like prompt to all of theCLIPTextEncodeA1111.Unlike the solution of @taabata, my solution has the potential to support controlnet. However, my solution is messy and requires a lot of nodes (which can be automatically generated using a script included in my repo). The syntax is slightly different form A1111 tho because I don't want to use the
:as the same character is also used for embedding in ComfyUI. My solution also support recursion syntax.Here's the repo: https://github.com/SadaleNet/CLIPTextEncodeA1111-ComfyUI
Recently someone implemented this. Try this. https://github.com/asagi4/comfyui-prompt-control
Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.
You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.
Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.
You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.
I agree with this logic. Being able to step... the step... would allow you do this elegantly with ksampler advanced. @comfyanonymous does this seem logical?
Rather than having a custom node that tries to do everything at once, or having a ton of different nodes for each step, would it not make sense to have a literal "step" parameter in the k-sampler advanced node? It could function like the third argument in a python range method (start, stop, step) and be called something like "increment" to be less confusing.
You'd be able to achieve the [cat|dog] effect in a more powerful (but more verbose way) using just 2 KSampler (Advanced) nodes that are offset one by in their start step and their respective prompt nodes.
I don't think this idea would work. It'd require the latent output and latent input of the two KSamplerAdvanced nodes to connect with each others.
+1 for that feature. I was often using both alternating words ([cow|horse]) and [from:to:when] (as well as [to:when] and [from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass.
It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.
+1 for that feature. I was often using both alternating words (
[cow|horse]) and[from:to:when](as well as[to:when]and[from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.
Now ComfyUI supports, ConditioningSetTimestepRange.
+1 for that feature. I was often using both alternating words (
[cow|horse]) and[from:to:when](as well as[to:when]and[from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.Now ComfyUI supports,
ConditioningSetTimestepRange.
Is there an example of how to do this with that? I wasn't getting same sort of results, but I am not exactly sure how to use it, just what seems like how to do it.
+1 for that feature. I was often using both alternating words (
[cow|horse]) and[from:to:when](as well as[to:when]and[from::when]) syntax to achieve interesting results / transitions in A1111 during single sampling pass. It's an effective way for using different prompts for different steps during sampling, and it would be nice to have it natively supported in ComfyUI. It would probably require enhancing implementation of both CLIP encoders and samplers, though.Now ComfyUI supports,
ConditioningSetTimestepRange.
The thing is that for more complex prompts and multiple prompts / CLIP encoders setup we'd be quickly flooded with nodes. Sample (and still relatively simple) prompt from A1111:
[dslr photography : oil on canvas painting : 0.1] of a [blue | red] sphere in the city, [dark ink : airbrush : 0.25], dark cyberpunk future, high quality, high resolution
Negative prompt: low quality, low resolution
Steps: 30, Sampler: Euler, CFG scale: 7, Seed: 0, Size: 1024x1024, Model hash: e6bb9ea85b, Model: sd_xl_base_1.0_0.9vae, Clip skip: 2, Score: 7.19, Version: v1.5.1
and the output:
It's very easy and fun to make that kind of transitions in A1111, and it works pretty well.
Doing something like that via extra nodes would basically mean that for every unique combination of the prompt we would have to create duplicates of prompt and conditioning nodes.
And imagine doing it with more advanced flows - for example my basic setup for SDXL is 3 positive + 3 negative prompts (one for each text encoder: base G+, base G-, base L+, base L-, refiner+, refiner-). If I wanted to do transitions like in the example above in the ComfyUI, I would have to make few times more nodes just to handle that prompt. And each time I would like to add or remove some transitions in the prompt, I would have to reconfigure whole flow.
The prompt2prompt way looks like much better idea to me, to be honest. If anyone would like to (and/or knows how to) implement it in ComfyUI, here is original implementation of this feature from Doggettx, and here is v2 (might be useful as reference). It would probably work best if it was included in the basic ComfyUI functionality (not as custom nodes).
Is there an example of how to do this with that? I wasn't getting same sort of results, but I am not exactly sure how to use it, just what seems like how to do it.
For the from:when, you would set the start and end for both prompts and then pipe them into a Conditioning (Combine)
The custom node i created allows for token alternating and prompt editing with control net as well. link: https://github.com/taabata/Comfy_Syrian_Falcon_Nodes/tree/main
I'm late at the party, but +1 for the request.
Now ComfyUI supports,
ConditioningSetTimestepRange.
@ltdrdata if I get it right, this node can be used as an alternative to [from:to:when] syntax. But:
- It still requires us to manually split text prompt into pieces. What if a prompt contains multiple such entries, each using it's own switch point? This can quickly require literal dozens of nodes just for that.
- As far as I can see, there's still no alternative to
[cow|horse]syntax. Which is usually used with multiple entries, too. This prompt:[grey|white|brown] [cow|horse] on a [grass|field|courtyard|lawn|glade]immediately creates3*2*5=30 prompt variants. Which currently can be achieved in ComfyUI only with 30 text node copies and an INSANELY intertwined graph.
Worst of all, both solutions make a network prompt-dependent.
So... is it planned to implement an actual equivalent for this syntax?
I'm late at the party, but +1 for the request.
Now ComfyUI supports,
ConditioningSetTimestepRange.@ltdrdata if I get it right, this node can be used as an alternative to
[from:to:when]syntax. But:
- It still requires us to manually split text prompt into pieces. What if a prompt contains multiple such entries, each using it's own switch point? This can quickly require literal dozens of nodes just for that.
- As far as I can see, there's still no alternative to
[cow|horse]syntax. Which is usually used with multiple entries, too. This prompt:[grey|white|brown] [cow|horse] on a [grass|field|courtyard|lawn|glade]immediately creates3*2*5=30 prompt variants. Which currently can be achieved only with 30 text node copies and an INSANELY intertwined graph.Worst of all, both solutions make a network prompt-dependent.
So... is it planned to implement an actual equivalent for this syntax?
Yeah. It seems that we need develope the wrapper.
@ltdrdata Maybe, you don't need an edge-case wrapper. Maybe, you need an extension to the current data type + an upgrade to the currently present nodes.
Sorry if what I'm going to suggest doesn't make sense to you (if it is, disregard this comment): I'm not sure about the specific python implementation of data flow in ComfyUI. But maybe, instead of a new uber-all-in-one node, what we need is something like conditioning v2 data type (between nodes). Which is treated not as a single data instance, but as an iterator handle of such data.
- I assume, current
conditioningconnection passes data through only once, at evaluation start. Unlike it, dependent nodes connected withconditioning v2would request the data instance at each step. - it's the source node's resposibility what it outputs. It may output the same conditioning at each step, but may generate different ones.
- if a current (legacy) datatype is connected to a node with the newer version input, it's just automatically converted into an infinite iterator of the same thing
- To let dependent nodes do any work only once, there could be some metadata attached to indicate the number of unique conditioning objects it generates, their IDs, etc.
I'm also late to the party too and I will also +1 this request. I tried the custom nodes presented in this thread. Sadly @taabata 's one is not working for me. I got different errors that I solved before hitting one that I did not understand. @SadaleNet 's one works well on my machine but is not scalable.
Prompt alternating is a great way to achieve some effects that are hard to obtain in a different way.
Conditioning concat or combine should give results that are close to alternating prompts.
The issue is, it makes the graph unmanageable. To do it with conditioning concat, we need to manually split a single prompt into multiple nodes... and the split point usually moves within the prompt, which makes the prompting process unnecessarily overcomplicated.
