Inconsistency of classifier-free guidance between training and testing.

Open TianpengBu opened this issue 1 year ago • 2 comments

HI, authors, Great work! My question about the implementation is as follows:

During training, I found that you randomly set 20% of CLIP's input as zeros tensors,

however, during testing, you concatenate the output of clip embedding with zero tensors, like this:

As far as I am concerned, to align the training and testing, should we randomly set 20% of the output of CLIP as zero tensors rather than the input of CLIP model?

Mar 28 '24 12:03 TianpengBu

@TianpengBu I totally agree with you

Mar 28 '24 16:03 jiangzhengkai

If the input is zero, the output should also be zero, right?

Aug 07 '24 02:08 abcdvzz