The effect of the LayerNorm?

Open whalewang410 opened this issue 2 years ago • 0 comments

I have a question about LayerNorm. In the paper, you mentioned that if we implement LayerNorm in the network, the Q-values will be bounded by the norm of the weight layer. With the formula explained, I’m still perplexed by why the inequality holds for the last and the second-to-last term. To make the inequality hold, I think we should keep the norm of the output of LayerNorm less than 1. But this can not be guaranteed, could you please provide me with more descriptions of this conclusion? 1699522827736

Nov 12 '23 12:11 whalewang410