onnxruntime
onnxruntime copied to clipboard
[CPU] SparseAttention op
Description
Add SparseAttention cpu implementation. It depends on CPU Flash Attention in #20805.
This work is still in progress:
- [x] Refactoring GQAAttentionBase
- [x] Add SparseAttention implementation
- [x] Add test cases
- [ ] Test performance