flash-attention Make alibi slopes a trainable parameter?

Make alibi slopes a trainable parameter?

Open penguinshin opened this issue 3 weeks ago • 1 comments

Im trying to supply a trainable tensor to the alibi slopes argument so that I can have trainable relative biases. However when I do this, I see zero gradients still. Is there a way to enable trainable slopes?

Oct 31 '25 03:10 penguinshin

No that's not implemented (one would have to change the backward pass code to compute the gradient of the slopes).

Oct 31 '25 13:10 tridao

flash-attention flash-attention copied to clipboard

Make alibi slopes a trainable parameter?

flash-attention
flash-attention copied to clipboard