darongliu

Results 3 comments of darongliu

Besides, I think the calculation of the adjoint matrix (line 99) is also wrong. It should be `adjoint = pTwp[[[1, 1], [0, 1]], [[1, 0], [0, 0]], :, :] `.

The usage of the masking is judged by whether the summation of the keys or queries at the last dimension is zero. However, the padding part, which is originally embedded...

No, I think it will not cause great performance drop. It is still a good implementation.