xinhaoH
xinhaoH
谢谢谢谢!------------------ 原始邮件 ------------------ 发件人: "yijun0612"
> When you run a function for the first time, the GPU needs to initialize and load the necessary computational resources, which may result in longer execution times. However, subsequent...
> Thank you for your questions!感谢您的提问! > > 1. This repository does not currently include an implementation of GQA, but I believe it should be relatively straightforward to add.该存储库目前不包含 GQA...
I tried using only the pruned tokens for the first token, and the performance was extremely poor. I believe that's why SnapKV uses full KV for the prefill attention.