flash-attention
flash-attention copied to clipboard
How Flash attention2 use in Prefix decoder?
My attention_mask is a dynamic mask matrix for the prefix decoder, similar to UniLM and GLM. How should this type of attention_mask be applied to Flash Attention?
That kind of mask is not currently supported.