FreeEdit icon indicating copy to clipboard operation
FreeEdit copied to clipboard

How you do zero-initialize and how good this technique is?

Open lucasgblu opened this issue 1 year ago • 4 comments

Hi ! congrats on this wonderful job. After reading your paper, I'm really curious about one technique that you use.

In the paper, you said:

To minimize the interference with the original model architecture, we also zero-initialize the output of the newly inserted reference attention module. This initialization ensures a smooth transition and minimal interference with the existing model’s performance.

How you do zero-initialize? is it zero-conv like ControlNet, or you just force it to be zeros? or you make the weight of Q, K or V to be zeros so that the outcome is zeros? Once you did this, will the model still learn to absorb the knowledge from the condition or it stays unlearned for zeros provide minor grads?

Finally, how good is this technique? Does it greatly or visibly improve the quality?

Congrats again

lucasgblu avatar Oct 06 '24 07:10 lucasgblu

Thank you for your attention to our work! We implement zero-init by setting the weight and bias of the final to_out linear layer of the attention module to 0 (most of the original knowledge is retained). Such a technique is just a trick in smooth training and will not have a big impact on the performance of the model itself.

hrz2000 avatar Oct 06 '24 12:10 hrz2000

thanks for your replay! I thought at first you were doing zero gating like what LLAMA-Adaptor did.

lucasgblu avatar Oct 07 '24 01:10 lucasgblu

by the way, the arxiv link of your paper in the homepage misleadingly directs to your another paper @hrz2000

lucasgblu avatar Oct 07 '24 07:10 lucasgblu

by the way, the arxiv link of your paper in the homepage misleadingly directs to your another paper @hrz2000

Thanks for your reminder~~ I will correct it now hahaha

hrz2000 avatar Oct 07 '24 07:10 hrz2000