composer Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types.

Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types.

Open harishankar-gopalan opened this issue 2 years ago • 3 comments

trafficstars

Given that from PyTorch 2.0 the dynamic dispatch to FlashAttention happens if the required conditions satisfy, I do not find a way to ensure whether FlashAttention is used by default. Also due to the HF dependency for general GPT recipes, which do not seem to use the F.scaled_dot_product_attention method of PyTorch, I am wondering if FlashAttention will really be used while using composer. Any ideas on how to easily enabled usage of FlashAttention while using HF model along with composer ?

Sep 25 '23 07:09 harishankar-gopalan

composer composer copied to clipboard

Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types.

composer
composer copied to clipboard