Transformer_Time_Series icon indicating copy to clipboard operation
Transformer_Time_Series copied to clipboard

Convolutional self-attention

Open Yanruoqin opened this issue 4 years ago • 3 comments

Dear mlpotter, your code is perfect! I found you just deal with the initial input by causal convolutions, however, the K and Q were still calculated by 'torch.nn.TransformerEncoderLayer'. Thus, this attention is consistent with canonical Transformer architecture.

Yanruoqin avatar Oct 27 '20 01:10 Yanruoqin

You are right, mlpotter's convolution method is wrong.

ddz16 avatar Apr 09 '21 07:04 ddz16

I agree with you.

Ralph-Liuyuhang avatar Mar 23 '22 03:03 Ralph-Liuyuhang

亲爱的 mlpotter,你的代码是完美的!我发现你只是通过因果卷积处理初始输入,但是,K 和 Q 仍然是由“torch.nn.TransformerEncoderLayer”计算的。因此,这种注意力与规范的 Transformer 架构是一致的。

What is the appropriate way to solve Q and K?

hriamli avatar Feb 06 '23 00:02 hriamli