hierarchical-attention-networks
hierarchical-attention-networks copied to clipboard
Mask for attention weight
Hi ematvey,
Thanks for sharing the code!
I notice the attention weights for sentence & word are not mask according to their actual length, which means the model will "pay attention" to the useless input. Is there a reason you didn't use a mask for the project?
Please correct me if I am wrong. Thanks! Xianlonb