StableSwarmUI icon indicating copy to clipboard operation
StableSwarmUI copied to clipboard

Automatic Prompt Generation

Open IamTirion opened this issue 1 year ago • 3 comments

In Fooocus, I just describe the scene in simple words in the prompts, without having to type in prompts like, photorealistic, masterpiece, cinematic, 4k, 8k, depth of field etc., or any negative prompts unless there are specific things I don't want. Are there plans for such feature in Stable Swarm? Thank you.

IamTirion avatar Jan 02 '24 00:01 IamTirion

For basic usage just apply a style preset, there's a good pack you can use as a basis in the docs https://github.com/Stability-AI/StableSwarmUI/blob/master/docs/Presets.md

For not-so-basic contextually-aware usage, I would indeed like to support that eventually. Probably not the same as fooocus, which uses a GPT2 word expander thing, I'd rather use a small LLM, eg StableLM-3B-4Bit* finetuned on prompts, so you could reuse it for more advanced usages, eg as a creative/interactive full generator tool, that could write full prompts or even adapt settings for you.

* (I suggest 3B-4Bit as naturally a requirement is being able to run on the same machine that is generating images, and quickly too, so it has to be tiny and fast - 3B-4bit has lower VRAM requirements than even SDv1 has, so it'd be perfect)

For some clarity on what that would mean in real usage, here's a Discord bot I made that does this in practice https://github.com/mcmonkey4eva/SimpleDiscordAIBot image

(That one uses a language model that hasn't even been finetuned for prompting, but it's still pretty good at it)

It could potentially be taken even further, eg there might be an interface for actively interacting with the LLM to request adjustments to previous generations.

I don't know exactly.


TLDR: Yes but as with everything in this project my goal is to ambitiously do far more than any UI like it has done before.

mcmonkey4eva avatar Jan 02 '24 13:01 mcmonkey4eva

Thank you. Right now, other than choosing a style, do we need to add anything else to the prompts, such as bad hands or deformed in the negative prompts, to get good results?

And will the LLM you implement be uncensored? Many people are unhappy with the censorship of DALL-E 3.

IamTirion avatar Jan 02 '24 16:01 IamTirion

such as bad hands or deformed in the negative prompts, to get good results? With SDXL, broadly speaking, negatives aren't too important. SDXL often works great when you just leave the negative empty. It can still help to a degree, but it's not nearly as important as it was in SDv1. The reference preset pack has short negatives included on each.

And will the LLM you implement be uncensored? that's the fun bit: because it'll be running locally, if you don't like the default one, you can just download a different one and use it instead :D I can't promise legal won't tell me any LLM I use has to be censored, but I can promise it'll just be a very swappable file.

mcmonkey4eva avatar Jan 02 '24 18:01 mcmonkey4eva