flash-attention How Flash attention2 use in Prefix decoder?

How Flash attention2 use in Prefix decoder?

Open lonelycrab888 opened this issue 2 years ago • 1 comments

My attention_mask is a dynamic mask matrix for the prefix decoder, similar to UniLM and GLM. How should this type of attention_mask be applied to Flash Attention?

Nov 21 '23 07:11 lonelycrab888

That kind of mask is not currently supported.

Apr 18 '24 10:04 tridao

flash-attention flash-attention copied to clipboard

How Flash attention2 use in Prefix decoder?

flash-attention
flash-attention copied to clipboard