Megatron-LM Speed up the creation of attention mask

Speed up the creation of attention mask

Open yuantailing opened this issue 9 months ago • 1 comments

Prefer to use the inplace variant of triu_/tril_ because they are faster than the out-of-place variants since torch 2.3.0 (https://github.com/pytorch/pytorch/pull/115013).

Apr 29 '24 01:04 yuantailing

generally, mask will be created inside transformer engine if --use-mcore-models

May 02 '24 19:05 ethanhe42

Marking as stale. No activity in 60 days.

Jul 02 '24 18:07 github-actions[bot]

Megatron-LM Megatron-LM copied to clipboard

Speed up the creation of attention mask

Megatron-LM
Megatron-LM copied to clipboard