SWEP icon indicating copy to clipboard operation
SWEP copied to clipboard

Some doubts for total loss?

Open wangyanhao0517 opened this issue 2 years ago • 2 comments

Sorry to bother you. I found that your code is different from your paper in the loss calculation process. Why is the final loss NLL + beta * KL? Your paper says KL is a part of noise

wangyanhao0517 avatar Apr 29 '22 03:04 wangyanhao0517

image Gradient of KL with respect to \theta (the parameter of BERT) is zero since we do not backpropagate it to BERT (torch.no_grad() in [line 42](https://github.com/seanie12/SWEP/blob/main/models.py#L43).

So the implementation is consistent with our equation.

seanie12 avatar Apr 29 '22 04:04 seanie12

image

Gradient of KL with respect to \theta (the parameter of BERT) is zero since we do not backpropagate it to BERT (torch.no_grad() in line 42. So the implementation is consistent with our equation. Thanks for your reply !

wangyanhao0517 avatar May 06 '22 09:05 wangyanhao0517