hiersumm icon indicating copy to clipboard operation
hiersumm copied to clipboard

Question about residual connections

Open zhiqihuang opened this issue 3 years ago • 0 comments

Hi

Thanks for sharing the code. After implementing Hierarchical Transformers, I found there is a difference in the residual connection between figure 2 and the code.

In here, the input to the PositionwiseFeedForward is already inputs + block_vec. And these match figure 2 in the paper.

However, why perform an output + x here ?

It looks like two residual connections.

zhiqihuang avatar Feb 07 '22 19:02 zhiqihuang