PIE about the Edit-factorized BERT Architecture

about the Edit-factorized BERT Architecture

Open wangwang110 opened this issue 2 years ago • 0 comments

for replace , when we calculate attention score of position i , we don't consider the token w(i).

at the first layer , I think it is no problem, but we use the info of w(i) indirectly at the seconder or upper layers.

Is it ok ?

Mar 24 '22 09:03 wangwang110