flash-linear-attention
flash-linear-attention copied to clipboard

Published 20 hours ago •

Reame
Issues

[RFC] Fuse shortconv and output norm/gate into the kernel

Open sustcsonglin opened this issue 9 months ago • 0 comments

Proposal

fuse shortconv and output norm/gate into kernels, as in Mamba1 and Mamba2

Rationale

QKV ShortConv will introduce three additional activations, resulting in a non-negligible memory overhead.

Jan 24 '25 03:01 sustcsonglin