SnapKV
SnapKV copied to clipboard
why only decode do compress?
@leeyeehoo @ctlllll @WendyH1108
I tried using only the pruned tokens for the first token, and the performance was extremely poor. I believe that's why SnapKV uses full KV for the prefill attention.