Torch was not compiled with flash attention warning
This is printed when I call functional.scaled_dot_product_attention:
[W914 13:25:36.000000000 sdp_utils.cpp:555] Warning: 1Torch was not compiled with flash attention. (function operator ())
I'm on Windows with TorchSharp-cuda-windows=0.103.0
Can you show the actual line of code used? Are you getting the warning during runtime or at compile / interpret?
I don't see this warning, when using a CausalSelfAttention layer inside of a transformer architecture.
This is the line of code I used:
// "Flash" attention
var y = F.scaled_dot_product_attention(q, k, v, is_casual: true);
where q,k,v are the query, key, values from a Causal Attention linear layer.
Hey @lostmsu, is this still relevant, can you try the latest version or share an example to reproduce?