diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

Weighted Prompts for Diffusers stable diffusion pipeline

Open UglyStupidHonest opened this issue 2 years ago • 35 comments

I could not find anything for diffusers and unfortunately I'm not on the Level yet where I can implement it myself. :)

It would be amazing to be able to weight prompts like "a dog with a hat:0.5"

Thank you for this amazing library !!

UglyStupidHonest avatar Dec 01 '22 15:12 UglyStupidHonest

This has unfortunately only been added as community pipeline, which imo, is a very broken system that just adds tons of work to the end-usage managing all these pipes, and not very API friendly.

https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py

With community pipelines, you get only what it advertises, and nothing else. It's not like the many other repos out there like AUTOMATIC where these things are more packages together for usage with all available features, creating a robust and feature rich system.

WASasquatch avatar Dec 01 '22 20:12 WASasquatch

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] avatar Jan 01 '23 15:01 github-actions[bot]

For future readers:

For a direct use case, we have the following community pipeline: https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py

You can also define your own attention processor that weighs certain prompts differently by making use of this API: https://github.com/huggingface/diffusers/pull/1639

patrickvonplaten avatar Jan 02 '23 14:01 patrickvonplaten

@patrickvonplaten Are there any plans to integrate this into the main pipeline? As @WASasquatch said the community pipeline implementation is not very user friendly. It seems like it would be pretty useful to have it built in as a feature given how often prompt weighting is used in the community

Ephil012 avatar Jan 08 '23 18:01 Ephil012

Upvoting this as I think prompt weighting is indeed an important feature that should be added to diffusers to compete with other alternative solutions. All other alternatives support it (Stable Diffusion WebUI, DreamStudio, Midjourney...).

Thanks for your hard work! <3

alexisrolland avatar Jan 09 '23 09:01 alexisrolland

cc @patil-suraj what do you think?

patrickvonplaten avatar Jan 13 '23 11:01 patrickvonplaten

My opinion here is that diffusers doesn't aim at being a full-fledged UI , but rather a backend for UIs such as:

  • InvokeAI: https://github.com/invoke-ai/InvokeAI/pull/1583
  • diffuzers: https://github.com/abhishekkrthakur/diffuzers

Nevertheless, we could/should try to more actively maintain: https://github.com/huggingface/diffusers/blob/main/examples/community/lpw_stable_diffusion.py and potentially write a documentation page about it.

Also @SkyTNT what do you think maybe :-)

patrickvonplaten avatar Jan 13 '23 11:01 patrickvonplaten

How does supporting prompt weighting transform diffusers toward a UI? I think the kind of usage that would be expected here is to be able to use weights in a way similar to this and let the backend do it’s magic ;) :

pipe = StableDiffusionPipeline.from_pretrained("./stable-diffusion-v1-5")
pipe = pipe.to("cuda")

prompt = "a photo of an ((astronaut)) riding a horse on mars"
# or
prompt = "a photo of an (astronaut:0.5) riding a horse on mars"
image = pipe(prompt).images[0]

alexisrolland avatar Jan 13 '23 14:01 alexisrolland

I agree with @Ephil012 . But I'm busy recently, so I may not be able to contribute.

SkyTNT avatar Jan 14 '23 12:01 SkyTNT

What does a user interface have to do with back-end functionality?

WASasquatch avatar Jan 14 '23 19:01 WASasquatch

@patrickvonplaten I'd argue that adding this feature does not lead to diffusers becoming a full fledged UI. This would simply be a feature on the backend when inputting prompts (like how alexisrolland mentioned).

You mentioned that the goal of diffusers is to act as a backend for projects providing a SD UI. However, by not implementing this feature it's arguably making it harder to use diffusers as a backend. When building a UI, most users expect there to be prompt weighting built in. By not having it in diffusers, it leads to each project having to build their own implementation. This causes duplicated work between projects and in general makes using diffusers harder. Personally, I started looking for other alternatives to diffusers to build my side project on top of simply because it was missing essential features like prompt weighting. I'd also argue other common features should be built in, such as long prompts (this may have already been added, not sure), but that's a discussion for another thread. Yes there are community pipelines that can be used, but it would make sense to have it in the main pipeline too for maintainability and reliability.

As far as implementation goes, I do think that some projects might not want to follow the A111 syntax. I think there could be a default syntax, which you could customize via code. Or you could take the approach imaginAIry does where they allow you to create a list of prompts and set weights in code (example below). Either approach would allow for using your own syntax

ImaginePrompt([
    WeightedPrompt("cat", weight=1),
    WeightedPrompt("dog", weight=1),
])

Ephil012 avatar Jan 15 '23 16:01 Ephil012

My opinion here is that diffusers doesn't aim at being a full-fledged UI , but rather a backend for UIs such as:

If you are going to refer people to the current InvokeAI code as an example of how to use diffusers as a backend, be warned that there are parts that are not pretty. 😆

This is definitely a place where we had to work around the StableDiffusionPipeline rather than with it. I see that _encode_prompt is its own method now, which at least allows the possibility of overriding it, but there are still a couple of reasons why Invoke had to work around it:

  • Under its current architecture, Invoke has already prepared the text embeddings by the time it's ready to do inference, and the pipeline doesn't have any method that takes that form of input.
  • The _encode_prompt method has the tokenization and encoding too entangled with the structure of the batch and the conditioned/unconditioned data.

You've already identified other use cases for exposing an API that takes text embeddings directly, such as #205 and #1869. It's also always easier to pass values to things than it is to subclass and override template methods, so factoring such a method out of the existing StableDiffusionPipeline.__call__ sounds like the way to go.

keturn avatar Jan 15 '23 17:01 keturn

I have a work in progress project of turning the prompt weighting code i built for InvokeAI into a library called Incite that would theoretically be able to plug in to any transformers-based system that takes a text string, tokenizes it, and then produces an embedding vector.

A simple way of providing painless weighting support would be for the stable diffusion pipeline to support conditioning vectors as alternative input to prompt strings. The process of doing weighted prompting would then look something like this:

pipeline = StableDiffusionPipeline.from_pretrained(...)
incite = Incite(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# weight of 'fluffy' is increased, weight of 'dark' is decreased
positive_conditioning_tensor = incite.build_conditioning_tensor(
    "a fluffy+++ cat playing with a ball in a dark-- forest"
) 
negative_conditioning_tensor = incite.build_conditioning_tensor(
    "ugly, poorly drawn, etc."
)

images = pipeline(positive_conditioning=positive_conditioning_tensor,
    negative_conditioning=negative_conditioning_tensor).images

This in itself is just a first step, however, - because being able to to alter prompts on the fly unlocks all sorts of other possibilities. Here's a more advanced design:

pipeline = StableDiffusionPipeline.from_pretrained(...)
incite = Incite(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# at 50% of the way through the diffusion process, replace the word "cat" with "dog"
prompt="a cat.swap(dog, start=0.5) playing with a ball in the forest" 
conditioning_scheduler = incite.build_conditioning_scheduler(
    positive_prompt=prompt, 
    negative_prompt=""
)

images = pipeline(conditioning_scheduler=conditioning_scheduler).images
# at the start of every diffusion step the pipeline queries the conditioning_scheduler 
# for positive and negative conditioning tensors to apply for that step

This unlocks the capability for, as one early reviewer, @raefu, put it, "a generalized macro language that ultimately creates conditioning vectors for every step of the image generation".

With such a flexible model it would be possible to do wild things like performing image comparison operations with the latent image vector part-way through the diffusion process and then programmatically altering the conditioning/prompt based on what has been partially diffused already. The possibilities are endless, and really quite exciting.

damian0815 avatar Jan 15 '23 19:01 damian0815

Opening a PR that allows text_embeddings to be passed via the __call__ method. This makes a lot of sense to me and is in line with https://github.com/huggingface/diffusers/issues/1869 .

patrickvonplaten avatar Jan 16 '23 13:01 patrickvonplaten

thanks @patrickvonplaten - with 0.12 and my prompt weighting library Compel (based on the InvokeAI weighting code) I can now do this to apply weights to different parts of the prompt:

from compel import Compel
from diffusers import StableDiffusionPipeline

pipeline = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
compel = Compel(tokenizer=pipeline.tokenizer, text_encoder=pipeline.text_encoder)

# upweight "ball"
prompt = "a cat playing with a ball++ in the forest"
embeds = compel.build_conditioning_tensor(prompt)
image = pipeline(prompt_embeds=embeds).images[0]

works great - thank you!

damian0815 avatar Jan 26 '23 10:01 damian0815

Very cool @damian0815 !

patil-suraj avatar Jan 26 '23 11:01 patil-suraj

So coool I need to try this !! Thank you!!

UglyStupidHonest avatar Jan 26 '23 11:01 UglyStupidHonest

@damian0815 very cool!

What would be the syntax if we want to add weight to a group of words rather than just a single word?

Thanks!

alexisrolland avatar Jan 28 '23 12:01 alexisrolland

@damian0815 very cool!

What would be the syntax if we want to add weight to a group of words rather than just a single word?

Thanks!

you can put the (words you want to weight)++ in parentheses

this (also (supports)-- nesting)+

speech marks "also work"+ like this

damian0815 avatar Jan 28 '23 14:01 damian0815

Thanks @damian0815 ! Do you actually have the link of a documentation describing the different syntaxes? I am also wondering how to add different level of weights to different bags of words... is it just something like:

(this bag is heavy)+++ while (this bag is medium)+ and (this one is really light)---

?

alexisrolland avatar Jan 28 '23 14:01 alexisrolland

that's right @alexisrolland . docs are linked on the readme but it's basically adapted from what i wrote for InvokeAI - https://invoke-ai.github.io/InvokeAI/features/PROMPTS/#prompt-syntax-features

damian0815 avatar Jan 28 '23 15:01 damian0815

@damian0815 If I may, I think it would be nice if your compel library supports the same syntax as SD WebUI since it is hugely popular. For example if it could accept () to increase weight and [] to decrease weight. See Doc here: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#attentionemphasis

alexisrolland avatar Jan 30 '23 09:01 alexisrolland

nope, not happening. the Auto111 syntax is rubbish

damian0815 avatar Jan 30 '23 10:01 damian0815

nope, not happening. the Auto111 syntax is rubbish

Ha ha ha as much as I agree with you, it's becoming the defacto standard 😀

I prefer your syntax too...

alexisrolland avatar Jan 30 '23 10:01 alexisrolland

what i might consider adding is a converter that can convert auto syntax to invoke syntax. pull requests welcome :)

damian0815 avatar Jan 30 '23 13:01 damian0815

That would be fantastic... the best of both worlds ^^

alexisrolland avatar Jan 30 '23 14:01 alexisrolland

BTW, another use case that should be somewhat easily enabled by this is long-weight prompting: https://github.com/huggingface/diffusers/issues/2136#issuecomment-1409978949

patrickvonplaten avatar Jan 31 '23 08:01 patrickvonplaten

@patrickvonplaten I saw that the PR added the ability to pass embeddings in now. From my understanding, you still need to either write the prompt weighting code yourself or use a third party library (like compel). Do you know if there's any plans to add built in prompt weighting (similar to the LPW community pipeline) into one of the main Stable Diffusion pipelines? That way people don't have to use a third party code for this functionality.

Ephil012 avatar Feb 05 '23 23:02 Ephil012

Prompt weightin won't be included in the main pipeline in order to keep the pipeline simple so that users can easily follow and modify the pipeline on their own. The philosphy behind this is explained in this doc, we encourage users to give it a read :)

patil-suraj avatar Feb 07 '23 13:02 patil-suraj

Maybe drop all that state of the art stuff, then. It's antiquated already. You all need to do better. People are going to be modifying this pipe, and be lost, because of the lack of proper support, for shenanigans. As it stands most big places using Diffusers aren't even using your pipes, but the community ones, and racking their heads on your backwards logic and "philosophy" (one of the worse things to talk about in open source code, your philosophy should be whatever the people want, otherwise just sell an API and be a business where this is expected behavior)

On Tue, Feb 7, 2023, 5:11 AM Suraj Patil @.***> wrote:

Prompt weightin won't be included in the main pipeline in order to keep the pipeline simple so that users can easily follow and modify the pipeline on their own. The philosphy behind this is explained in this doc https://huggingface.co/docs/diffusers/main/en/conceptual/philosophy, we encourage users to give it a read :)

— Reply to this email directly, view it on GitHub https://github.com/huggingface/diffusers/issues/1506#issuecomment-1420751046, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAIZEZPKO625MAKSEDA5R7DWWJCWVANCNFSM6AAAAAASQ5AD7Y . You are receiving this because you were mentioned.Message ID: @.***>

WASasquatch avatar Feb 12 '23 16:02 WASasquatch