attention-transfer icon indicating copy to clipboard operation
attention-transfer copied to clipboard

Setting of β

Open tangbohu opened this issue 6 years ago • 1 comments

Hi.

In the paper, the authors said "As for parameter β in eq. 2, it usually varies about 0.1, as we set it to 10^3 divided by number of elements in attention map and batch size for each layer. "

But I am still confused. What is 10^3 mean, and how 0.1 was got?

tangbohu avatar Aug 09 '18 15:08 tangbohu

@tangbohu I assume that β is 10^3 / batch_size / (feature_map_size)^2, this division occurs in the average function here in practice, batch size is set to 128 by default, and feature map size varies in the range of 32x32, 16x16, 8x8, so the aformentioned equation varies about 0.1. Just my own conjecture from the code.

d-li14 avatar Aug 25 '18 09:08 d-li14