unilm Is there any torch library based Differential Transformer code?

Hi,

Im looking around Differential Transformer paper and code, I found that github version is based on flash attention and rotary embedding.

I wonder that is there any plan to upload simple example with transformer using Diff attention and example argument (ex. adjust num_heads according to original transformer's or other positional embedding...)

Thanks

Oct 16 '24 14:10 DevKiHyun

I find several implements in github by searching Differential Transformer and I'm looking for a implement with static kv_cache and torch.compile for faster inference.

Oct 18 '24 08:10 AnticPan

I find several implements in github by searching Differential Transformer and I'm looking for a implement with static kv_cache and torch.compile for faster inference.

Hi, AnticPan.

Can you share me your findings?

Thanks.

Oct 23 '24 09:10 DevKiHyun

Hi @DevKiHyun You can refer to Section 3.1 and Appendix D in our paper for detailed configurations of our models. You can also directly use configs of open-sourced LLMs and change their model code to turn it into Diff arch.

Nov 18 '24 08:11 YTianZHU