diffusers icon indicating copy to clipboard operation
diffusers copied to clipboard

[Community] Move the number "0.18215" from the image2image process to VAE config

Open wangyu-ustc opened this issue 2 years ago • 10 comments

There is a magic number "0.18215" in the repository

In the file src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py, there is a number "0.18215" in line 220 and line 342, which is strange since it does occur in the original repo. Is there someone clarifying why is that and where does this number come from?

wangyu-ustc avatar Oct 04 '22 23:10 wangyu-ustc

It's a constant used to scale the latents so it can be decoded back into a image (src)

# scale and decode the image latents with vae
latents = 1 / 0.18215 * latents
image = vae.decode(latents).sample

WASasquatch avatar Oct 05 '22 04:10 WASasquatch

I think the constant is defined in the model config file from CompVis/stable-diffusion.

guaneec avatar Oct 05 '22 04:10 guaneec

There's more explanation about it in #437.

pcuenca avatar Oct 05 '22 05:10 pcuenca

Let's put it maybe directly in the VAE config then ? cc @patil-suraj

patrickvonplaten avatar Oct 05 '22 10:10 patrickvonplaten

Maybe this can be a method for a VAE that is overridable? For supporting more complex squashing functions 😉

neverix avatar Oct 05 '22 15:10 neverix

Think we can have this be a config parameter that is overrideable and a Union[int, str] with the string describing a more complex squashing function that can be implemented down the road.

Marking this for now as a community feature as it seems like no one finds the time to open a PR here - in case you're interested @neverix - we'd be more than happy to review a PR :-)

patrickvonplaten avatar Nov 07 '22 19:11 patrickvonplaten

Should be solved by: https://github.com/huggingface/diffusers/issues/1460

@williamberman could you maybe tackle this?

patrickvonplaten avatar Dec 01 '22 16:12 patrickvonplaten

Put up draft PR here: https://github.com/huggingface/diffusers/pull/1515 still need to think about a few things before finishing

williamberman avatar Dec 01 '22 22:12 williamberman

For reference, here's some code to estimate the magic value: https://github.com/huggingface/diffusers/issues/437#issuecomment-1356945792.

fepegar avatar Dec 19 '22 01:12 fepegar

Thanks a lot @fepegar !

patrickvonplaten avatar Dec 19 '22 23:12 patrickvonplaten

Put up draft PR here: #1515 still need to think about a few things before finishing

For people following this: the new PR is #1860

hervenivon avatar Jan 16 '23 11:01 hervenivon

https://github.com/huggingface/diffusers/pull/1860 is now merged, closing the issue.

patil-suraj avatar Jan 26 '23 13:01 patil-suraj