efficient-attention-architectures topic

List efficient-attention-architectures repositories

MHA2MLA

200
Stars
21
Forks
200
Watchers

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs