stable-diffusion-webui
stable-diffusion-webui copied to clipboard
Prompt-to-Prompt Image Editing with Cross Attention Control
Is your feature request related to a problem? Please describe. This is not a bug report but instead a question about a potential future enhancement of an already great tool.
Describe the solution you'd like I recently came across a couple of interesting videos (see below). Do you (developers) think that the herewith described feature might be integrated in SD Web UI? It seems to be a really advanced image manipulation feature to have.
Describe alternatives you've considered This is a brand new research, I haven't found a comparable, similar alternative, but I might be wrong.
Additional context Please, find below a few references that might be useful to evaluate this feature's integration feasibility.
2 Minute Papers' Video: https://www.youtube.com/watch?v=XW_nO2NMH_g&t=366s koiboi video: https://www.youtube.com/watch?v=vWytLjUtAgs Paper: https://arxiv.org/abs/2208.01626 Stable Diffusion implementation of Cross Attention, Github page: https://github.com/bloc97/CrossAttentionControl Colab Notebook: https://colab.research.google.com/drive/1PsWKXtqAAoDz-KGB45VeCXdTsqW-Mumo?usp=sharing
Thank you for your attention. I love your implementation of SD, the best one I've tried till now.
I think it is already implemented long time ago, this? https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing
I think it is already implemented long time ago, this? https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing That feature is internally called "scheduler" if I am not mistaken. It's one of the two variants of prompt parsing. It's been said to be the implementation though I did not see similar code as in the other implementations, I am new to Python and new to machine learning so I might just miss it.
I experimented with the prompt scheduler and I was not able to get the same quality of results as in the native demos, though this is probably the 10th open issue to the topic in the past month. The CrossAttention demos show very precise switches, partly something I've never seen done before. With a very stable image. With the prompt scheduler I have a less stable image, additional stuff is being dreamed in at totally unrelated locations etc.
On topic: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2310#issuecomment-1275755291
If there was an already open ticket on the same subject, I do apologize for the duplication, but to me it seems something more granular in the way it operates, taking in consideration the token index of the prompt, which would need to select one or more specific indices to be replaced with something else via alternate prompt. I am not a developer or an expert at all, therefore I can be totally wrong.
Google Prompt-to-Prompt: Latent Diffusion and Stable Diffusion implementation:
https://github.com/google/prompt-to-prompt
Also https://github.com/cccntu/efficient-prompt-to-prompt
#1280 #1825 #2310