DiffuSeq icon indicating copy to clipboard operation
DiffuSeq copied to clipboard

Question about tT_loss

Open ccchobits opened this issue 2 years ago • 2 comments
trafficstars

I am still confused about issue #17. The content of this issue has been duplicated as follow:

There is a tT_loss term in the final loss: DiffuSeq/diffuseq/gaussian_diffusion.py Lines 629 to 630 in 901f860

 out_mean, _, _ = self.q_mean_variance(x_start, th.LongTensor([self.num_timesteps - 1]).to(x_start.device)) 
 tT_loss =  mean_flat(out_mean ** 2) 

What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?

As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.

ccchobits avatar Jan 08 '23 10:01 ccchobits

Hi, Yes, tT_loss does not pass the transformer layer, but there are still learnable params, i.e. the params of word embedding (from x_start). We can regard it as a kind of regularization, so in Eq.17 we move it to $R(||x_0||^2)$.

summmeer avatar Jan 13 '23 03:01 summmeer

I am still confused about issue #17. The content of this issue has been duplicated as follow:

There is a tT_loss term in the final loss: DiffuSeq/diffuseq/gaussian_diffusion.py Lines 629 to 630 in 901f860

 out_mean, _, _ = self.q_mean_variance(x_start, th.LongTensor([self.num_timesteps - 1]).to(x_start.device)) 
 tT_loss =  mean_flat(out_mean ** 2) 

What is this? I cannot find it in the paper. And accroding to the code, the out_mean looks like the mean value of as from the diffusion forward procedure, and out_mean ** 2 should then be . Also, there seems no learnable params in the compute graph of tT_loss? I wonder what is this term for, what is the meaning, and where it comes from?

As for the comment from @yjc4 in #17, I think that term doesn't explain these issues, because obviously it has been dropped from step 1 to step 2 in equation (17) from the paper. Please provide me some hints. Thanks.

just by the way, it's issue #16

Dawn-LX avatar Feb 21 '23 09:02 Dawn-LX