FasterTransformer Possible bug in decoder cross attention

Possible bug in decoder cross attention

Open no42name42 opened this issue 2 years ago • 1 comments

Hi, If I take the same encoder input and pad it to a different maximum length, then I get noticeably different encoder memory key/value tensors from decoder cross attention. And with some inputs this results in slightly different result tokens. After reviewing the code, I found the is_batch_major_cache_ argument, which is true by default. I tried attention with is_batch_major_cache_ = false and the key/value and result tokens mismatch was gone. My guess is that the default implementation of decoder cross attention is not handling memory lengths correctly.

Aug 03 '22 20:08 no42name42

Please provide a reproduce steps as described in bug template. Thank you.

Aug 04 '22 00:08 byshiue

Close this bug because it is inactivated. Feel free to re-open this issue if you still have any problem.

Sep 08 '22 07:09 byshiue

FasterTransformer FasterTransformer copied to clipboard

Possible bug in decoder cross attention

FasterTransformer
FasterTransformer copied to clipboard