WS_DAN
WS_DAN copied to clipboard
Different claims for the paper and the code on attention regularization
Hi there,
Thanks for the contribution! After reading the code, I am kind of confused on the attention regularization part. Please correct me if there is some misunderstanding.
From the code, what I understand for the center loss part is that for every class(label), you have a center for the features and obviously those features are also used for softmax classification with multiplying a scale 100. However, what you claimed in the paper is that the center loss is used for the attention regularization which will assign each attention feature in the feature matrix a center. The equation you used in the paper for center loss is the sum of distance difference between those attention features ("with an distinguished M in the equation").
Is there any explanation of doing this?
I have the same question, can anyone help explain this Thank the future helpers~
same question!