SWEP Some doubts for total loss?

Some doubts for total loss?

Open wangyanhao0517 opened this issue 2 years ago • 2 comments

Sorry to bother you. I found that your code is different from your paper in the loss calculation process. Why is the final loss NLL + beta * KL? Your paper says KL is a part of noise

Apr 29 '22 03:04 wangyanhao0517

Gradient of KL with respect to \theta (the parameter of BERT) is zero since we do not backpropagate it to BERT (torch.no_grad() in [line 42](https://github.com/seanie12/SWEP/blob/main/models.py#L43).

So the implementation is consistent with our equation.

Apr 29 '22 04:04 seanie12

Gradient of KL with respect to \theta (the parameter of BERT) is zero since we do not backpropagate it to BERT (torch.no_grad() in line 42. So the implementation is consistent with our equation. Thanks for your reply !

May 06 '22 09:05 wangyanhao0517

SWEP SWEP copied to clipboard

Some doubts for total loss?

SWEP
SWEP copied to clipboard