CTranslate2
CTranslate2 copied to clipboard
Flash Attention regurgitates repeated tokens - seq2seq
Having some generation issues with NMT models trained with OpenNMT-py, which include OpenNMT-py versions before flash attention existed and one I'm currently training with the most recent which include flash attention. Models were converted using onmt_release_model with storage quantization set to int8.
Happens when turning flash_attention=True on when creating the ctranslate.Translator object. GPU is a RTX 3090.
Don't know if this is just an arch issue or something to do with the conversion process from opennmt-py.
Examples of some outputs given the Flores200 benchmark
sss of of of of of of of of of of of
sss in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in patients patients patients in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in in patients in in in in in in in in in in in in in patients patients patients patients patients patients patients patients patients patients patients patients patients in in patients patients patients patients patients in countries countries in in in in in in in in in in in in in in in in in in in in in in in in
ssss
ssmmmmmmmm
ss
__opt_src_en__opt_src_en__opt_src_en
sss
sss of of of of of of of of of of of
sss tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax tax
sssmmmmmmmmmmmmmmmmmmm