DeepSeek-V2
DeepSeek-V2 copied to clipboard
Is there any analysis on time complexity of MLA?
Especially down-projection and up-projection. Will that lead to a large amount of operations like matrix multiplication?