darongliu
Results
3
comments of
darongliu
Besides, I think the calculation of the adjoint matrix (line 99) is also wrong. It should be `adjoint = pTwp[[[1, 1], [0, 1]], [[1, 0], [0, 0]], :, :] `.
The usage of the masking is judged by whether the summation of the keys or queries at the last dimension is zero. However, the padding part, which is originally embedded...
No, I think it will not cause great performance drop. It is still a good implementation.