efficient-attention-architectures topic
List
efficient-attention-architectures repositories
MHA2MLA
200
Stars
21
Forks
200
Watchers
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs