axlearn
axlearn copied to clipboard
support ssd/mamba2
Support mamba2 which uses SSD kernel (as opposed to attention) as the token mixing mechanism.
Reference:
- https://arxiv.org/abs/2405.21060 (mamba2 paper)