stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

Add consistency Decoder to VAE options.

Open KohakuBlueleaf opened this issue 8 months ago • 7 comments

Description

Consistency Decoder: https://github.com/openai/consistencydecoder I added it Just like TAESD.

It require lot of resource to "decode" the image (it is actually a latent guided consistency model on pixel space directly). So we may want to implement some tile method for it. But directly tile may not be a good idea. And mathmetically identical tile algorithm require pytorch implementation(I'm testing it but it is not successful)

may require more dev on this, but since it is wokring as it should be for now. I PR it.

Checklist:

KohakuBlueleaf avatar Nov 07 '23 05:11 KohakuBlueleaf

It looks pretty much useless. For decoding 1024x1024 it consumes 26 GB VRAM and its 10+ times slower than regular VAE. In ClosedAI examples they even use 256x256... At least for anime at 1024x1024 this is produce image even slightly worse than 840k.

wcde avatar Nov 07 '23 07:11 wcde

For anime models it seems like a clear downgrade:

VAE, took 8.9s to generate: grid-0001

Consistency decoder, took 22.6s: grid-0000

(although the bow on the girl in 4th pic seems more consistent)

Here's for a normal photo generation:

VAE, 13.9 sec. A: 5.60 GB, R: 6.90 GB, Sys: 9.4/24 GB (39.2%) grid-0002 Consistency decoder, 46.5 sec. A: 13.79 GB, R: 22.14 GB, Sys: 24.0/24 GB (100.0%) grid-0003

Photo with a skyscraper:

VAE, 13.8 sec. A: 5.60 GB, R: 6.90 GB, Sys: 7.4/24 GB (30.7%): grid-0004

Consistency decoder, 34.8 sec. A: 13.80 GB, R: 22.14 GB, Sys: 24.0/24 GB (100.0%) grid-0005

AUTOMATIC1111 avatar Nov 07 '23 08:11 AUTOMATIC1111

Do you have any examples with more distant photorealistic faces with consistency decoder instead? I'd be willing to wait the extra time to finally get some decent mid range faces without adetailer. I have to say also, on the anime example the colors looking way better on consistency decoder on oled screen here.

311-code avatar Nov 08 '23 06:11 311-code

VAE: grid-0001

Consistency decoder: grid-0003

This is on one of new cool kids models; it works better at larger resolution, but I did at 768x512 to invoke weird faces.

As for anime colors, this is a NAI VAE thing. Here's the same picture using vae-ft-mse-840000-ema-pruned.ckpt without consistency decoder:

grid-0004

Also you can check out this PR and make your own for testing purposes.

AUTOMATIC1111 avatar Nov 08 '23 10:11 AUTOMATIC1111

Anime is going to look terrible with anything other than animevae because NAI finetuned the VAE. Eyes in particular turn out worse with any other VAE, as the examples posted show. And the washed out color is a simple postprocessing fix.

As for this VAE's intended purpose of replacing the stock SD VAE: It's certainly better than the one included in the base SD checkpoints (from CompVis), but if it's better than the ones StabilityAI later finetuned I'd say is situational. The example posted here of the cake might be the only one that looks nicer with Consistency imo. You have to a/b to make out any difference but the details look more organic whereas with a normal GAN VAE it looks more like noisy predictions.

I've tested out this PR myself the last couple days and the examples shown here align with my tests as well. Don't feel the need to post any more comparisons personally.

catboxanon avatar Nov 08 '23 14:11 catboxanon

Thanks for those examples. Honestly, it kinda looks a bit worse compared to just VAE, Was pretty hyped about the consistency encoder, but I guess my excitement's cooled off a bit now.

311-code avatar Nov 08 '23 17:11 311-code

some updates here:

This PR will never be merged but I will left it opened until we finish the change of latent-related things.

The plan is I will make some BaseClass for Latent decode/encode/process, which let extension can add their own latent process easily. And then I will make a example extension which use Consistency Decoder to run decode. Once I done it I will close this PR

KohakuBlueleaf avatar Nov 09 '23 09:11 KohakuBlueleaf

closing; reopen if needed

AUTOMATIC1111 avatar Jan 01 '24 13:01 AUTOMATIC1111