stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Feature Request]: Add ability to merge images ad hoc

Open AmericanPresidentJimmyCarter opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Feature allows you to merge two images using BLIP/CLIP interrogate and then blending image latents and conditioning.

frida

earring

fusion_ha

fusion_ha_ha

Proposed workflow

Just follow the demo code here, I prepopulated the prompts but you can use BLIP/CLIP to generate them.

https://gist.github.com/AmericanPresidentJimmyCarter/b4b69daa577936cb72aec4db44d0a2ea

Additional information

No response

Q: Which method of combining 2 input images is better? https://github.com/DiceOwl/StableDiffusionStuff/blob/main/interpolate.py

readme: https://github.com/DiceOwl/StableDiffusionStuff#interpolate

I hope eventually we can make an interpolation video between 2 inputs.

ClashSAN avatar Nov 07 '22 23:11 ClashSAN

I just mix values randomly, lerp, or slerp. I apply it to both the image latents and the conditionings. Doing a video should be pretty trivial, just iterate over np.linspace for n many frames.

Thanks, would add this to wiki but its not so user friendly atm

ClashSAN avatar Nov 08 '22 04:11 ClashSAN

Here are some scripts for video. This one just uses the BLIPed/CLIP ranked prompts.

Script: https://gist.github.com/AmericanPresidentJimmyCarter/159b6fc3a538ae0221a58967dfc2b705

Example:

https://user-images.githubusercontent.com/110263573/200696942-c21d8a28-33cd-4785-acf8-0c2da0432789.mp4

Here is another script where we define what the expected prompt is for the midstate. Here I morph a photo of a pigeon into a photo of a man that someone sent me, with a description of the midstate.

Script: https://gist.github.com/AmericanPresidentJimmyCarter/790c9ae23ff0831a74d9a48977ee712d

https://user-images.githubusercontent.com/110263573/200697290-5490e0c7-5f30-4dd0-a84c-231bc51b1301.mp4

In both cases I found that the most significant differences occur around the midstate, so I weights the transition towards that with for itr, i in enumerate(np.linspace(0., 1., STEPS_IN_OUT)**(1/2)).

has this been implemented as a user script under A1111 ? (similarly to the "interpolate" one by DiceOwl)

Ehplodor avatar Dec 30 '22 21:12 Ehplodor