stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

CLIP Guidance PoC

Open space-nuko opened this issue 3 years ago • 8 comments

Related: #2738

Basic implementation for now, only supports batch size 1

All credit goes to Birch-san's original code at https://github.com/Birch-san/stable-diffusion/commits/clip-attempt-6

space-nuko avatar Dec 21 '22 07:12 space-nuko

Thank you for this, we've been waiting :)

just FYI, getting 2.22 it/s with it enabled, and 25 it/s with it disabled. I run clip guidance in other repos and the speed decrease shouldn't be that dramatic.

Also - I'm not quite sure why clip_guidance_scale defaults to 500. It is an absurdly large number :)

hithereai avatar Dec 21 '22 18:12 hithereai

Putting clip guidance on ~200 with Novel AI produces some lovely results.

1girl, solo, pose, smile, long hair, ponytail, detailed background, in New York City, highres, HD, 4k, high quality, trending, 8k Negative prompt: blurry, lowres, low quality, cropped, text, bad anatomy, scar, mutation, ugly, deformed Steps: 69, Sampler: Euler a, CFG scale: 8, Seed: 1234474942, Size: 512x512, Model hash: 925997e9, Model: final-pruned, Clip skip: 2, ENSD: 4444

With clip guidance off: 08710-1234474942-1girl solo pose smile long hair ponytail detailed background in New York City highres HD 4k high quality trending 8txx8bmaa

With clip guidance on (Clip guidance model: ViT-B-32|laion2b_s34b_b79k, Clip guidance scale: 200): 08709-1234474942-1girl solo pose smile long hair ponytail detailed background in New York City highres HD 4k high quality trending 8fn4bzlne

There's also less requirement to focus as much on weighting, notice below how "wide hips" completely took over with it turned off.

1girl, at night, half-rim glasses, angry, frown, detailed background, in front of the Eiffel Tower, wide hips, highres, HD, 4k, high quality, trending, 8k Negative prompt: blurry, lowres, low quality, cropped, text, bad anatomy, scar, mutation, ugly, deformed Steps: 69, Sampler: Euler a, CFG scale: 8, Seed: 1234474942, Size: 512x512, Model hash: 925997e9, Model: final-pruned, Clip skip: 2, ENSD: 4444

With clip guidance off: 08711-1234474942-1girl at night half-rim glasses angry frown detailed background in front of the Eiffel Tower wide hips highres HD 4kdh5x28oq

With clip guidance on (Clip guidance model: ViT-B-32|laion2b_s34b_b79k, Clip guidance scale: 200): 08712-1234474942-1girl at night half-rim glasses angry frown detailed background in front of the Eiffel Tower wide hips highres HD 4k0r6y8_kr

Now just needs to work with highres fix lol

dothere avatar Dec 23 '22 19:12 dothere

Its very cool, but work only on >16gb vram?

Nyaster avatar Dec 23 '22 20:12 Nyaster

would prefer if this was an extension

AUTOMATIC1111 avatar Dec 24 '22 05:12 AUTOMATIC1111

how long do these renders take?

TingTingin avatar Dec 25 '22 09:12 TingTingin

@TingTingin 2-2.5 it/s with clip guidance on compared to 25-29 it/s with clip guidance off, 4090 on windows.

It's very slow. Should be faster like other repos with clip guidance, but it's not at this stage 🔢

hithereai avatar Dec 25 '22 10:12 hithereai

Is there any update on this? This massively increases photo quality. I'm surprised we don't have it already

outpoints avatar Jan 25 '23 02:01 outpoints

@outpoints I wouldn't wait for this, even if it made it into an extension, the speed sacrifice is not worth it. The implementation must be lacking in some way or another because I run sd with clip guidance on other sd versions and the speed is slower, but far from what I get with this PR unfortunately.

hithereai avatar Jan 27 '23 19:01 hithereai

This PR can be closed...

hithereai avatar Mar 25 '23 09:03 hithereai