Attentive-Neural-Process
Attentive-Neural-Process copied to clipboard
Some clarifications about attention used
Thank you for sharing the code. According to the paper, Appendix A 2nd paragraph, dropout is not used for attention.
In line 205, the residual and result are concatenated, but I think they should be added elementwise and then passed through a layer_norm (Figure 8 ANP paper). I wonder if there is some reason for this modification.
Thanks, Deep Pandey