Role of ref_emb_scale

Open YeolJ00 opened this issue 2 years ago • 0 comments

I am wondering if ref_emb_scale, which is ultimately provided as scale in this function,

https://github.com/drboog/ProFusion/blob/31f03ff904b7af72aa3054f314854a084eb8c42e/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_promptnet.py#L278 is gamma in Algorithm1 of your paper.

Can you provide the details regarding this functionality? The scale doesn't seem to have any effect on the reference prompt but rather have something to do with the original prompt. The comments suggest that having a higher refine_emb_scale leads to using more information from the input image. Why does scaling down (default value is 0.8) the second to last hidden state of the CLIP encoder output lead to using more of the input (reference) image?

Jun 27 '23 12:06 YeolJ00