fairseq icon indicating copy to clipboard operation
fairseq copied to clipboard

torch 1.12 improve fairseq TransformerLayer ?

Open SeunghyunSEO opened this issue 1 year ago • 1 comments

According to this link, torch 1.12.0 improve inferece speed of TransformerEncoder, TransformerEncoderLayer or MultiheadAttention (MHA) in specific conditions (when we use lots of padding tokens) by fusing cuda kernel so on.

However, fairseq use it's own TransformerLayer. Despite this, is there any improvement in fairseq too? or is it better to use pytorch's Transformer Layers?

2022-7-12-a-better-transformer-for-fast-transformer-encoder-inference-3

SeunghyunSEO avatar Jul 15 '22 08:07 SeunghyunSEO

Not now, I would believe.

Because this new feature "fast path" is applied when why_not_fast_path==False ( '' is False) and using torch._native_multi_head_attention, which is implemented with C.

Fairseq uses F.multi_head_attention_forward which is the method called when why_not_fast_path==True (non-empty string is True).

gmryu avatar Jul 18 '22 05:07 gmryu