stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Interesting idea worth implementing: use CLIP guidance to enhance the quality and coherency of images

Open Centurion-Rome opened this issue 3 years ago • 4 comments
trafficstars

Is your feature request related to a problem? Please describe. sometimes bigger images are not coherent

Describe the solution you'd like See idea behind thi post https://www.reddit.com/r/StableDiffusion/comments/y4fekg/dreamstudio_will_now_use_clip_guidance_to_enhance/

Centurion-Rome avatar Oct 15 '22 14:10 Centurion-Rome

It's not about larger images. You can fix the larger images by enabling the high resolution checkbox.

CLIP guidance is a slower process that uses CLIP every frame and is more about helping it follow the prompt in more detail than maintaining coherence at large sizes. CLIP guidance gets stable diffusion a lot closer to Dall-E 2 in terms of correctly understanding prompts (which isn't perfect, but it's better).

lendrick avatar Oct 15 '22 16:10 lendrick

Someone already implemented it apparently

https://github.com/Birch-san/stable-diffusion/compare/34556bc45211e0a1d3554ee0cd6795793889fbfb...1f13594371a31c417c8ce5f296fead63e43be337

https://twitter.com/Birchlabs/status/1578141960249876482

kybercore avatar Oct 15 '22 21:10 kybercore

So how do you integrate this with the interface code? The filenames don't line up quite so I cannot simply copy and paste the code changes mentioned.

ASilver avatar Oct 24 '22 14:10 ASilver

Was CLIP Guidance ever implemented into Automatic1111?

slymeasy avatar Nov 13 '22 15:11 slymeasy

Worth noting that implementing native CLIP Guidance would allow for dramatic improvements to outpainting, see https://www.reddit.com/r/StableDiffusion/comments/ysv5lk/outpainting_mk3_demo_gallery/

aaronsantiago avatar Nov 15 '22 16:11 aaronsantiago

Any new developments on this by chance?

tchesket avatar Dec 16 '22 22:12 tchesket

Hi I implemented the code from Birch-san's repository into webui, I dont know anything about the underlying math but it seems to work okay

https://github.com/space-nuko/stable-diffusion-webui/tree/feature/clip-guidance

Note that even for a single image it is very slow and some people recommend >50 steps for best results. Also note that this implementation only works when batch_size=1

Also for the record, I think this would be very difficult to be made into an extension since it requires modifications to how the stable diffusion samplers work

Some examples I made w/Euler a sampling, 50 steps

No CLIP guidance: 42069-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ViT-B-16-plus-240, pretrained=laion400m_e32, CLIP guidance scale=200: 42070-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

roberta-ViT-B-32, pretrained=laion2b_s12b_b32k, CLIP guidance scale=250: 42075-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ViT-B-32, pretrained=laion2b_s34b_b79k, CLIP guidance scale=200: 42071-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ViT-B-32, pretrained=laion2b_s34b_b79k, CLIP guidance scale=300: 42072-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

ViT-B-32, pretrained=laion2b_s34b_b79k, CLIP guidance scale=400: 42073-588878215-photograph, realistic, hyper detail, masterpiece, best quality, painting, rendered in blender, 4k, ((tracing)), (subsurface scat

space-nuko avatar Dec 20 '22 23:12 space-nuko

Hi I implemented the code from Birch-san's repository into webui, I dont know anything about the underlying math but it seems to work okay

https://github.com/space-nuko/stable-diffusion-webui/tree/feature/clip-guidance

Note that even for a single image it is very slow and some people recommend >50 steps for best results. Also note that this implementation only works when batch_size=1

Also for the record, I think this would be very difficult to be made into an extension since it requires modifications to how the stable diffusion samplers work

Some examples I made w/Euler a sampling, 50 steps

I installed it but am getting constant CUDA out of memory issues. I reduced the CLIP Guidance to 50, just in case, but it made no difference. Ex:

RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 8.00 GiB total capacity; 7.22 GiB already allocated; 0 bytes free; 7.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

This is using an RTX 2080 Super with 8GB VRAM.

ASilver avatar Dec 21 '22 18:12 ASilver

Yeah I think the VRAM requirements are just really high, I dont remember it taking less than 16GB for me with xformers enabled

Part of the reason is I had to turn off checkpointing for it to work, thats a feature that saves VRAM but cant be used with some torch features apparently (torch.grad.autograd() in this case). I dont know if it just has to be implemented like that or if theres another way that an actual ML whiz could figure out

space-nuko avatar Dec 22 '22 02:12 space-nuko

Do we have ability to run this into 8gb vram?

Nyaster avatar Dec 23 '22 04:12 Nyaster

Is this available as an extension or is it a full fork?

azureprophet avatar Dec 24 '22 05:12 azureprophet

Its a fork for now, I had to make some changes to the original code to get it to work correctly, also Im still trying to figure out how to improve the performance

space-nuko avatar Dec 30 '22 18:12 space-nuko

As soon as you have a possible way to work with 8GB VRAM, drop a note here and I will gladly help test.

On Fri, Dec 30, 2022 at 3:44 PM space-nuko @.***> wrote:

Its a fork for now, I had to make some changes to the original code to get it to work correctly, also Im still trying to figure out how to improve the performance

— Reply to this email directly, view it on GitHub https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2738#issuecomment-1368048253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA3MNV4XJ6RVPBVCNMEONKDWP4UPXANCNFSM6AAAAAARF6FQ6Q . You are receiving this because you commented.Message ID: @.***>

ASilver avatar Dec 30 '22 19:12 ASilver

I hope this can be installed as an extension, so we don't use a fork - or implemented directly in A1111.

andupotorac avatar Jun 07 '23 12:06 andupotorac