maaf
maaf copied to clipboard
Visualize attention
Thank you for your work. Can you share the code for visualizing attention?
Thanks for your interest in our work! Unfortunately the visualization code would not be easy to share. Briefly, what we did is: (1) modify the Transformer code to save the attention maps to disk, (2) run the model with this modification for each of several inputs, loading the attention map from disk after running each example. (3) For the attention map for "sleeves" for example we subtracted a "baseline" attention map obtained with many random words in the place of "sleeves".
Sorry I can't be of more help with this, at least on short notice.
Dear author,
Do you give options to run the model basing on each method in this table?
Thank you for your answer.
Yes, you can try these variations using the --attn_2stream_mode parameter with --model set to attention. The default xxx_xmm_xff corresponds to the second line of this table. In general you can pass in a string that will be parsed into a sequence of attention operations here.