stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Prompt-to-Prompt Image Editing with Cross Attention Control

Open Harvester62 opened this issue 2 years ago • 6 comments

Is your feature request related to a problem? Please describe. This is not a bug report but instead a question about a potential future enhancement of an already great tool.

Describe the solution you'd like I recently came across a couple of interesting videos (see below). Do you (developers) think that the herewith described feature might be integrated in SD Web UI? It seems to be a really advanced image manipulation feature to have.

Describe alternatives you've considered This is a brand new research, I haven't found a comparable, similar alternative, but I might be wrong.

Additional context Please, find below a few references that might be useful to evaluate this feature's integration feasibility.

2 Minute Papers' Video: https://www.youtube.com/watch?v=XW_nO2NMH_g&t=366s koiboi video: https://www.youtube.com/watch?v=vWytLjUtAgs Paper: https://arxiv.org/abs/2208.01626 Stable Diffusion implementation of Cross Attention, Github page: https://github.com/bloc97/CrossAttentionControl Colab Notebook: https://colab.research.google.com/drive/1PsWKXtqAAoDz-KGB45VeCXdTsqW-Mumo?usp=sharing

Thank you for your attention. I love your implementation of SD, the best one I've tried till now.

Harvester62 avatar Oct 15 '22 12:10 Harvester62

I think it is already implemented long time ago, this? https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing

hd-x avatar Oct 15 '22 13:10 hd-x

I think it is already implemented long time ago, this? https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Features#prompt-editing That feature is internally called "scheduler" if I am not mistaken. It's one of the two variants of prompt parsing. It's been said to be the implementation though I did not see similar code as in the other implementations, I am new to Python and new to machine learning so I might just miss it.

I experimented with the prompt scheduler and I was not able to get the same quality of results as in the native demos, though this is probably the 10th open issue to the topic in the past month. The CrossAttention demos show very precise switches, partly something I've never seen done before. With a very stable image. With the prompt scheduler I have a less stable image, additional stuff is being dreamed in at totally unrelated locations etc.

cmp-nct avatar Oct 15 '22 13:10 cmp-nct

On topic: https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2310#issuecomment-1275755291

aleksusklim avatar Oct 15 '22 15:10 aleksusklim

If there was an already open ticket on the same subject, I do apologize for the duplication, but to me it seems something more granular in the way it operates, taking in consideration the token index of the prompt, which would need to select one or more specific indices to be replaced with something else via alternate prompt. I am not a developer or an expert at all, therefore I can be totally wrong.

Harvester62 avatar Oct 15 '22 17:10 Harvester62

Google Prompt-to-Prompt: Latent Diffusion and Stable Diffusion implementation:

https://github.com/google/prompt-to-prompt

nightkall avatar Oct 16 '22 11:10 nightkall

Also https://github.com/cccntu/efficient-prompt-to-prompt

grctest avatar Oct 18 '22 21:10 grctest

#1280 #1825 #2310

mezotaken avatar Jan 13 '23 17:01 mezotaken