ring-flash-attention
ring-flash-attention copied to clipboard
Does ring-attn not support dropout?
In the backward function of ring-attn, rng_state does not use the value from forward function, but directly passes in None. Does this indicate that ring-attn does not support dropout?