ControlNet icon indicating copy to clipboard operation
ControlNet copied to clipboard

Latent vector input

Open jh27kim opened this issue 1 year ago • 3 comments

The model performs really well, and I would like to further explore its potential benefits in other domains.

When input (latents) is not provided, ControlNet automatically fill in latents with random Gaussian. But I want to generate samples from the input (latent) I provide to the model.

Should I encode input (latents) before providing into the network? (Maybe using VAE as in stable diffusion) Or as an raw image?

I tried to find it in the code but it seems no encoding is performed for input. (Maybe Stable Diffusion API does it) I'm confused since I am getting low quality images when I provide encoded vector of an image as latent.

Any thoughts ?

jh27kim avatar Apr 13 '23 04:04 jh27kim

Hi @jh27kim , are you training a custom ControlNEt? Or you mean for inference?

alelordelo avatar Apr 17 '23 15:04 alelordelo

I'm current using it in inference time. As a matter of a fact, I also wonder how it should be given during training time.

So my question is should inputs (latent vector) be encoded using StableDiffusion's encoder (vae) prior to ControlNet inference ?

I'm currently encoding an image using Stable Diffusion's vae. Am I using this in an intended way ?

jh27kim avatar Apr 19 '23 01:04 jh27kim

Hi @jh27kim , I feel what you say is related to "inversion" (for example, DDIM inversion). I am also curious about it.

XiaoyuShi97 avatar Apr 20 '23 09:04 XiaoyuShi97