Liger-Kernel
Liger-Kernel copied to clipboard
[Operator] Fused Neighborhood Attention
Summary
https://github.com/linkedin/Liger-Kernel/issues/733
Testing Done
Tested Attention Layer and Attention module implementation for FusedNeighborhoodAttention
- Hardware Type: 3090 & H100 SXM5
- [x] run
make testto ensure correctness - [x] run
make checkstyleto ensure code style - [x] run
make test-convergenceto ensure convergence
@Tcc0403 @lancerts @qingquansong @shivam15s
Eventual goal is to compare sparsity performance with sparse MTA.
I will try to get H100 benchmark numbers tonight.
RTX 3090 numbers:
Fwd:
Bwd:
memory:
H100 SXM5 numbers:
Fwd:
Bwd:
memory:
@AndreSlavescu Great work! Do you happen to have a sense of how the triton kernel impl compares to the reported numbers for their cutlass kernel implementation?
@AndreSlavescu Great work! Do you happen to have a sense of how the triton kernel impl compares to the reported numbers for their cutlass kernel implementation?
I can do a more insightful benchmark on terms of FLOPs achieved and plot arithmetic intensity to compare. They report numbers relative to naive NA, but I will need to review the paper for exams FLOPs details.
@AndreSlavescu Great work! Do you happen to have a sense of how the triton kernel impl compares to the reported numbers for their cutlass kernel implementation?
I can do a more insightful benchmark on terms of FLOPs achieved and plot arithmetic intensity to compare. They report numbers relative to naive NA, but I will need to review the paper for exams FLOPs details.
Yeah that would be great to see and put in the PR desc/release notes