diffusers
diffusers copied to clipboard
[Community] Move the number "0.18215" from the image2image process to VAE config
There is a magic number "0.18215" in the repository
In the file src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py
, there is a number "0.18215" in line 220 and line 342, which is strange since it does occur in the original repo. Is there someone clarifying why is that and where does this number come from?
It's a constant used to scale the latents
so it can be decoded back into a image (src)
# scale and decode the image latents with vae
latents = 1 / 0.18215 * latents
image = vae.decode(latents).sample
I think the constant is defined in the model config file from CompVis/stable-diffusion.
There's more explanation about it in #437.
Let's put it maybe directly in the VAE config then ? cc @patil-suraj
Maybe this can be a method for a VAE that is overridable? For supporting more complex squashing functions 😉
Think we can have this be a config parameter that is overrideable and a Union[int, str]
with the string describing a more complex squashing function that can be implemented down the road.
Marking this for now as a community feature as it seems like no one finds the time to open a PR here - in case you're interested @neverix - we'd be more than happy to review a PR :-)
Should be solved by: https://github.com/huggingface/diffusers/issues/1460
@williamberman could you maybe tackle this?
Put up draft PR here: https://github.com/huggingface/diffusers/pull/1515 still need to think about a few things before finishing
For reference, here's some code to estimate the magic value: https://github.com/huggingface/diffusers/issues/437#issuecomment-1356945792.
Thanks a lot @fepegar !
Put up draft PR here: #1515 still need to think about a few things before finishing
For people following this: the new PR is #1860
https://github.com/huggingface/diffusers/pull/1860 is now merged, closing the issue.