Attentive-Neural-Process Some clarifications about attention used

Some clarifications about attention used

Open pandeydeep9 opened this issue 3 years ago • 0 comments

Thank you for sharing the code. According to the paper, Appendix A 2nd paragraph, dropout is not used for attention.

In line 205, the residual and result are concatenated, but I think they should be added elementwise and then passed through a layer_norm (Figure 8 ANP paper). I wonder if there is some reason for this modification.

Thanks, Deep Pandey

May 26 '21 15:05 pandeydeep9

Attentive-Neural-Process Attentive-Neural-Process copied to clipboard

Some clarifications about attention used

Attentive-Neural-Process
Attentive-Neural-Process copied to clipboard