Transformer_Time_Series
Transformer_Time_Series copied to clipboard
Convolutional self-attention
Dear mlpotter, your code is perfect! I found you just deal with the initial input by causal convolutions, however, the K and Q were still calculated by 'torch.nn.TransformerEncoderLayer'. Thus, this attention is consistent with canonical Transformer architecture.
You are right, mlpotter's convolution method is wrong.
I agree with you.
亲爱的 mlpotter,你的代码是完美的!我发现你只是通过因果卷积处理初始输入,但是,K 和 Q 仍然是由“torch.nn.TransformerEncoderLayer”计算的。因此,这种注意力与规范的 Transformer 架构是一致的。
What is the appropriate way to solve Q and K?