flash-attention
flash-attention copied to clipboard
Make alibi slopes a trainable parameter?
Im trying to supply a trainable tensor to the alibi slopes argument so that I can have trainable relative biases. However when I do this, I see zero gradients still. Is there a way to enable trainable slopes?
No that's not implemented (one would have to change the backward pass code to compute the gradient of the slopes).