ControlNet
ControlNet copied to clipboard
Latent vector input
The model performs really well, and I would like to further explore its potential benefits in other domains.
When input (latents) is not provided, ControlNet automatically fill in latents with random Gaussian. But I want to generate samples from the input (latent) I provide to the model.
Should I encode input (latents) before providing into the network? (Maybe using VAE as in stable diffusion) Or as an raw image?
I tried to find it in the code but it seems no encoding is performed for input. (Maybe Stable Diffusion API does it) I'm confused since I am getting low quality images when I provide encoded vector of an image as latent.
Any thoughts ?
Hi @jh27kim , are you training a custom ControlNEt? Or you mean for inference?
I'm current using it in inference time. As a matter of a fact, I also wonder how it should be given during training time.
So my question is should inputs (latent vector) be encoded using StableDiffusion's encoder (vae) prior to ControlNet inference ?
I'm currently encoding an image using Stable Diffusion's vae. Am I using this in an intended way ?
Hi @jh27kim , I feel what you say is related to "inversion" (for example, DDIM inversion). I am also curious about it.