stable-diffusion-webui icon indicating copy to clipboard operation
stable-diffusion-webui copied to clipboard

[Feature Request]: Latent diffusion upscaler for the Stable Diffusion autoencoder

Open specblades opened this issue 3 years ago • 52 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues and checked the recent builds/commits

What would your feature do ?

Can we implement it?

Q from Twitter @RiversHaveWings: I've trained a latent diffusion upscaler for the Stable Diffusion autoencoder (and anything you feel like feeding into it if you can tolerate a little artifacts) in collaboration with @stabilityai . Try the Colab written by @nshepperd1 https://colab.research.google.com/drive/1o1qYJcFeywzCIdkfKJy7cTpgZTCM2EI4

image image

Proposed workflow

See in colab

Additional information

No response

specblades avatar Nov 07 '22 21:11 specblades

Is it similar to the "scale latents" option for A1111's highres fix ? (see https://github.com/AUTOMATIC1111/stable-diffusion-webui/issues/2613 for example for some info)

Ehplodor avatar Nov 07 '22 21:11 Ehplodor

Is it similar to the "scale latents" option for A1111's highres fix ? (see #2613 for example for some info)

Nope

specblades avatar Nov 07 '22 22:11 specblades

I tested this out, wasn't hard to convert it to a stand-alone script. Note that there is a stray config file used from here - https://huggingface.co/spaces/multimodalart/latentdiffusion/blob/main/latent-diffusion/models/first_stage_models/kl-f8/config.yaml (not the CompVis repo), but just putting that file there did work.

Also added .half() in a bunch of places to use less VRAM, etc. sdu.txt

pbaylies avatar Nov 08 '22 04:11 pbaylies

I tested this out, wasn't hard to convert it to a stand-alone script. Note that there is a stray config file used from here - https://huggingface.co/spaces/multimodalart/latentdiffusion/blob/main/latent-diffusion/models/first_stage_models/kl-f8/config.yaml (not the CompVis repo), but just putting that file there did work.

Also added .half() in a bunch of places to use less VRAM, etc. sdu.txt

Can u explane for newbie how to use it? I cant get through image

specblades avatar Nov 08 '22 11:11 specblades

@specblades probably better to start with the notebook, then, it will be friendlier to use.

pbaylies avatar Nov 08 '22 16:11 pbaylies

@specblades probably better to start with the notebook, then, it will be friendlier to use.

Goal is to use it in webui, i mean nvm, mb somebody will add it

specblades avatar Nov 08 '22 17:11 specblades

Little comparison x2 upscale: native/ldsr (50 steps)/autoencoder upscale (48 steps)

The last is much sharper, but with artefacts. I like it more. And its faster by ~10-20x We rly need it in a1111!

01698-Euler a, 1406838393, thin_horror_young_girl_demoness,_breasts,_busty,_wearing_intricate_closed_dress,painting_by_artist(james_jean_1 2)and(jean_d image image

specblades avatar Nov 08 '22 17:11 specblades

Emad retweet it. Hero, pls, make it work in a1111

image

specblades avatar Nov 10 '22 14:11 specblades

Okay, I was able to convert @pbaylies sdu.txt into a script for automatic1111

https://gist.github.com/nagolinc/3993e7329cafab5d5bd4698977ebebcc

Before you can run it, you will need to download the two files: https://models.rivershavewings.workers.dev/config_laion_text_cond_latent_upscaler_2.json https://models.rivershavewings.workers.dev/laion_text_cond_latent_upscaler_2_1_00470000_slim.pth

into your {automatic}/models/LDSR/ folder

nagolinc avatar Nov 10 '22 22:11 nagolinc

Okay, I was able to convert @pbaylies sdu.txt into a script for automatic1111

https://gist.github.com/nagolinc/3993e7329cafab5d5bd4698977ebebcc

Before you can run it, you will need to download the two files: https://models.rivershavewings.workers.dev/config_laion_text_cond_latent_upscaler_2.json https://models.rivershavewings.workers.dev/laion_text_cond_latent_upscaler_2_1_00470000_slim.pth

into your {automatic}/models/LDSR/ folder

Could you provide some kind of tutorial on how to use it? There are no ui for change upscale ammount or steps

specblades avatar Nov 11 '22 00:11 specblades

Okay, I was able to convert @pbaylies sdu.txt into a script for automatic1111

https://gist.github.com/nagolinc/3993e7329cafab5d5bd4698977ebebcc

Before you can run it, you will need to download the two files: https://models.rivershavewings.workers.dev/config_laion_text_cond_latent_upscaler_2.json https://models.rivershavewings.workers.dev/laion_text_cond_latent_upscaler_2_1_00470000_slim.pth

into your {automatic}/models/LDSR/ folder

Many OOMs, "RuntimeError: Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same" and other errors. Does not save the result automatically.

specblades avatar Nov 11 '22 01:11 specblades

Exception: bad file inside ./models/LDSR/laion_text_cond_latent_upscaler_2_1_00470000_slim.pth: laion_text_cond_latent_upscaler_2_1_00470000_slim/data.pkl

The file may be malicious, so the program is not going to read it. You can skip this check with --disable-safe-unpickle commandline argument.

arpitest avatar Nov 11 '22 19:11 arpitest

it start with --disable-safe-unpickle, then works only once, then restart required to work again once :( but it's good for testing, i like the result very much! (the second run gives the "Input type (torch.cuda.HalfTensor) and weight type (torch.HalfTensor) should be the same" error)

could it be implemented as an upscaler, to be usable in Extras tab too, instead of being a script? if it's a latent diffusion upscaler (as the twitter says), maybe it's enough to add an LDSR model selector to settings, to use this version of model?

arpitest avatar Nov 11 '22 20:11 arpitest

hmm this is also interetsing: "[Stability AI]Nov 10 That means that if you want to use it with Stable Diffusion, you take the generated "latent" and pass into the upscaler before decoding with your standard VAE."

arpitest avatar Nov 12 '22 07:11 arpitest

i've done some cleanup on the script, removed unused imports and functions, and moved a few model.to() around, and now it works fine, no crash, feel free to test: http://thot.banki.hu/arpi/sdu_upscale.py

also added some test code to main() so it can be used standalone from cli. next step will be adding it to extras as upscaler :) no idea how to do that yet...

arpitest avatar Nov 12 '22 20:11 arpitest

i've replaced the bilinear interpolate function in processing.py (for highres fix latent scaling) to a call to this new model (which operates in latent space anyway), and IT WORKS! if you enable highres fix & scale latent, and set denoising to zero, then it will immediately upscale by 2x the generated image! if you set denoising to some low value <0.3 then it will work further on the upscaled version and fix some artifacts too!

!!!!!!!!!!!!!!!!!! UPSCALING LATENT !!!!!!!!!!!!!!!!!!!!! ['Portrait photo of Goddess'] before: torch.Size([1, 4, 64, 64]) 42it [00:06, 6.75it/s] after: torch.Size([1, 4, 128, 128]) !!!!!!!!!!!!!!!!!! UPSCALING DONE !!!!!!!!!!!!!!!!!!!!!

@AUTOMATIC1111 please look at run_sdu_latent_upscale() in http://thot.banki.hu/arpi/sdu_upscale_mod.py it can be used as replacement in processing.py instead of torch.nn.functional.interpolate(samples, ..., mode="bilinear")

arpitest avatar Nov 12 '22 22:11 arpitest

i've replaced the bilinear interpolate function in processing.py (for highres fix latent scaling) to a call to this new model (which operates in latent space anyway), and IT WORKS! if you enable highres fix & scale latent, and set denoising to zero, then it will immediately upscale by 2x the generated image! if you set denoising to some low value <0.3 then it will work further on the upscaled version and fix some artifacts too!

!!!!!!!!!!!!!!!!!! UPSCALING LATENT !!!!!!!!!!!!!!!!!!!!! ['Portrait photo of Goddess'] before: torch.Size([1, 4, 64, 64]) 42it [00:06, 6.75it/s] after: torch.Size([1, 4, 128, 128]) !!!!!!!!!!!!!!!!!! UPSCALING DONE !!!!!!!!!!!!!!!!!!!!!

@AUTOMATIC1111 please look at run_sdu_latent_upscale() in http://thot.banki.hu/arpi/sdu_upscale_mod.py it can be used as replacement in processing.py instead of torch.nn.functional.interpolate(samples, ..., mode="bilinear")

You can re-create the discussion so you have the authorship

specblades avatar Nov 12 '22 22:11 specblades

hmm, it even works for 4x upscaling (calling the model twice on latents), the 4x version reminds me to old midjourney style... unfortunately 2048x2048 resulted cuda out of memory, so the example here is 512x384, original, 2x and 4x upscaled in latent space:

05537-1234567-A Detailed hyper-realistic Sinister and dark colored, Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creature-before-highres-fix 05540-1234567-A Detailed hyper-realistic Sinister and dark colored, Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creature 05538-1234567-A Detailed hyper-realistic Sinister and dark colored, Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creature

i really love this upscaler! it has an unique style at 4x but still better than other methods: tmpgapc3xcw

arpitest avatar Nov 12 '22 23:11 arpitest

@arpitest i dunno, a1111 wont load it in t2i/i2i

COMMANDLINE_ARGS= --disable-safe-unpickle --xformers --allow-code

image

specblades avatar Nov 12 '22 23:11 specblades

@specblades the _mod version is not a script, it should be in modules/ and the func called from hires-fix part of processing.py, needs some small changes to the code... i hope @AUTOMATIC1111 will do it soon in proper way, instead of my ugly hack :)

arpitest avatar Nov 12 '22 23:11 arpitest

@arpitest i understand can u modify processing.py to load it? or it will work if i put it in modules folder?

i think we need both - upscale in extras and hr-fix

specblades avatar Nov 12 '22 23:11 specblades

@arpitest i understand can u modify processing.py to load it? or it will work if i put it in modules folder?

i think we need both - upscale in extras and hr-fix

here is my modified version: http://thot.banki.hu/arpi/processing.py copy this (replace original) and the sdu_upscale_mod.py to modules/
then restart, enable scale latent in settings and hires fix at txt2img, set denoising=0 then generate something :) (it has 4x scaling now, if you want 2x then remove one of the 2 calls to run_sdu_latent_upscale)

sorry this is a proof of concept only, not intended for wide use this ugly way :)

arpitest avatar Nov 12 '22 23:11 arpitest

@arpitest i want to try but image image

nvm, do ur best, please! You are awesome!

specblades avatar Nov 12 '22 23:11 specblades

even the 4x is looking pretty good for some images:

05593-1223568812-a portrait of a character in a scenic environment by sandra chevrier, hyperdetailed, trending on artstation-before-highres-fix 05594-1223568812-a portrait of a character in a scenic environment by sandra chevrier, hyperdetailed, trending on artstation

arpitest avatar Nov 13 '22 07:11 arpitest

Out of curiosity, does anybody knows of some comparison between this new latent upscaler and the "scale latent hires-fix" already implemented in A1111 (that performs bilinear interpolation in latent space) ?

Ehplodor avatar Nov 15 '22 11:11 Ehplodor

(that performs bilinear interpolation in latent space) ?

using bilinear interpolation on vectors (latents are not pixels!) is a bad idea anyway... there are methods for vector interpolation, like euler, quaternion etc. it's a very different math...

arpitest avatar Nov 15 '22 12:11 arpitest

(that performs bilinear interpolation in latent space) ?

using bilinear interpolation on vectors (latents are not pixels!) is a bad idea anyway... there are methods for vector interpolation, like euler, quaternion etc. it's a very different math...

I see a way for improving highres-fix's scale latents option here. TY

Ehplodor avatar Nov 15 '22 12:11 Ehplodor

(that performs bilinear interpolation in latent space) ?

using bilinear interpolation on vectors (latents are not pixels!) is a bad idea anyway... there are methods for vector interpolation, like euler, quaternion etc. it's a very different math...

I see a way for improving highres-fix's scale latents option here. TY

FYI : feature request for new latent vector interpolation methods here

Ehplodor avatar Nov 15 '22 12:11 Ehplodor

Out of curiosity, does anybody knows of some comparison

just did one. it's a bit difficult to do 1:1 comparison, because at <0.3 denoise level the bilinear version just creates blurry mess, and at >=0.7 (default) both produce a similar sharp image but it does not look good at all. So testing at 0.5:

A Detailed hyper-realistic Sinister and dark colored, Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creatures, Unreal Engine 5, horror, high resolution, detailed digital art Steps: 30, Sampler: Euler a, CFG scale: 7, Seed: 1234, Size: 1536x1024, Model hash: 81761151, Denoising strength: 0.5, First pass size: 768x512

First pass image: 06525-1234-A Detailed hyper-realistic Sinister and dark colored Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creature-before-highres-fixuysr4b87

Bilinear latent upsacaler: 06526-1234-A Detailed hyper-realistic Sinister and dark colored Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creatureb7oyqe1s

Old latent upscaler, mode changed from 'bilinear' to 'nearest': 06528-1234-A Detailed hyper-realistic Sinister and dark colored Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creatureqvjn0710

New NN latent upscaler: 06524-1234-A Detailed hyper-realistic Sinister and dark colored Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creaturee_82t5k7

New NN latent upscaler at denoise level 0.25: 06520-1234-A Detailed hyper-realistic Sinister and dark colored Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creaturezv78r6xi

And for comparison, with "Upscale latent space image when doing hires. fix" disabled in settings: 06530-1234-A Detailed hyper-realistic Sinister and dark colored Nouveau Architecture Horror House Ruined by Lovecraftian Eldritch Creaturecysk_rdk

arpitest avatar Nov 15 '22 12:11 arpitest

TY @arpitest This is very interesting in itself. Now I think I understant why a dedicated model would be preferable instead of "dumb" latent vector interpolation (specifically, bilinear). Unsure about "nearest", though. And you said there are other methods as well. I'm too curious. NB : TY for the comparison with "hires-fix when disabled scale latent" also. enligthening.

Ehplodor avatar Nov 15 '22 13:11 Ehplodor