composer
composer copied to clipboard
Is FlashAttention really used while using HuggingFaceModel supported as one of ComposerModel types.
trafficstars
Given that from PyTorch 2.0 the dynamic dispatch to FlashAttention happens if the required conditions satisfy, I do not find a way to ensure whether FlashAttention is used by default. Also due to the HF dependency for general GPT recipes, which do not seem to use the F.scaled_dot_product_attention method of PyTorch, I am wondering if FlashAttention will really be used while using composer. Any ideas on how to easily enabled usage of FlashAttention while using HF model along with composer ?