WS_DAN icon indicating copy to clipboard operation
WS_DAN copied to clipboard

Different claims for the paper and the code on attention regularization

Open tao0420 opened this issue 5 years ago • 2 comments

Hi there,

Thanks for the contribution! After reading the code, I am kind of confused on the attention regularization part. Please correct me if there is some misunderstanding.

From the code, what I understand for the center loss part is that for every class(label), you have a center for the features and obviously those features are also used for softmax classification with multiplying a scale 100. However, what you claimed in the paper is that the center loss is used for the attention regularization which will assign each attention feature in the feature matrix a center. The equation you used in the paper for center loss is the sum of distance difference between those attention features ("with an distinguished M in the equation").

Is there any explanation of doing this?

tao0420 avatar Dec 18 '19 19:12 tao0420

I have the same question, can anyone help explain this Thank the future helpers~

LawrenceXia2008 avatar Jun 07 '20 04:06 LawrenceXia2008

same question!

17314796423 avatar Apr 25 '24 12:04 17314796423