DPTNet
DPTNet copied to clipboard
[Question] Did you evaluate the performance gain of using the improved Transformer layer instead of the standard Transformer layer?
Hi, I am curious about the importance of the proposed improved Transformer layer compared to the standard one (w/o the positional encoding). But I couldn't find the related information in the paper.
I think I have the answer now. I tried to replace the RNN with a feedforward layer, and it seems to converge very slow.