onnxruntime icon indicating copy to clipboard operation
onnxruntime copied to clipboard

[CPU] SparseAttention op

Open tianleiwu opened this issue 1 year ago • 0 comments

Description

Add SparseAttention cpu implementation. It depends on CPU Flash Attention in #20805.

This work is still in progress:

  • [x] Refactoring GQAAttentionBase
  • [x] Add SparseAttention implementation
  • [x] Add test cases
  • [ ] Test performance

Motivation and Context

tianleiwu avatar Jun 20 '24 03:06 tianleiwu