fairseq torch 1.12 improve fairseq TransformerLayer ?

torch 1.12 improve fairseq TransformerLayer ?

Open SeunghyunSEO opened this issue 2 years ago • 1 comments

According to this link, torch 1.12.0 improve inferece speed of TransformerEncoder, TransformerEncoderLayer or MultiheadAttention (MHA) in specific conditions (when we use lots of padding tokens) by fusing cuda kernel so on.

However, fairseq use it's own TransformerLayer. Despite this, is there any improvement in fairseq too? or is it better to use pytorch's Transformer Layers?

2022-7-12-a-better-transformer-for-fast-transformer-encoder-inference-3

Jul 15 '22 08:07 SeunghyunSEO

Not now, I would believe.

Because this new feature "fast path" is applied when why_not_fast_path==False ( '' is False) and using torch._native_multi_head_attention, which is implemented with C.

Fairseq uses F.multi_head_attention_forward which is the method called when why_not_fast_path==True (non-empty string is True).

Jul 18 '22 05:07 gmryu

fairseq fairseq copied to clipboard

torch 1.12 improve fairseq TransformerLayer ?

fairseq
fairseq copied to clipboard