SnapKV issues

Group Query Attention

4

Hello, Could you clarify how you handle group query attention ? For instance in Mistral 7B, there are 8 key value heads and 32 heads. So a given key-value pair...

SimJeg

Here is my env. The version of `transfomers` is meet the requirements in `monkeypatch.py` ``` torch==2.2.0 transfomers==4.37.0 ``` The traceback are as follows: traceback >> python pred_snap.py --model llama2-7b-chat-4k --compress_args_path...

HarryWu99

Question on H2O experiment reproduction

Thanks for your excellent work! As stated in the paper Table 1: "Performance comparison of SnapKV and H2O across various LLMs on LongBench", could you provide the scripts/codes for reproducing...

CUHKSZzxy

Can snapkv compress kv in case different user questions are posed towards the same context?

1

Say there is a long document, then two users ask two different questions based on the document. These two questions are no way similar, targeting on different part of the...

namespace-Pt

Could you provide the code for visualization the Hit Rate?

Could you provide the code for visualization the Hit Rate like fig 2 & 3?

Dominic789654

Question on GQA implementation

1

In GQA, only one copy of kv cache will be saved for each group, but snapKV saves kv cache with `num_key_value_heads * num_key_value_groups` heads. Indeed in kv cache eviction, the...

cyLi-Tiger

why only decode do compress?

1

@leeyeehoo @ctlllll @WendyH1108

CSEEduanyu

Questions on paper and code [prompting for mistral, positional index, minor errors & questions in paper]

8

Hello :) Thank you for the excellent work and for sharing your code. I've learned a lot and have a few questions about the paper and settings: - In Figures...

MarsJacobs

maybe a bug in `update_kv` function

1

https://github.com/FasterDecoding/SnapKV/blob/ea655b18061313e088879bd2b4a3e3c0c2dc2e21/snapkv_utils.py#L50 In `update_kv` function, instead of using the function's arguments `attention_mask`, this variable is overridden.

HarryWu99

The effect of Clustering via Pooling may be greater？

1

Just a guess. What will happen if **H2O** also uses **Clustering via Pooling** when comparing? It seems that Clustering via Pooling can improve the effectiveness of such drop token methods.

HarryWu99

SnapKV
SnapKV copied to clipboard

Metadata

Group Query Attention

Can't not run longbench!

Question on H2O experiment reproduction

Can snapkv compress kv in case different user questions are posed towards the same context?

Could you provide the code for visualization the Hit Rate?

Question on GQA implementation

why only decode do compress?

Questions on paper and code [prompting for mistral, positional index, minor errors & questions in paper]

maybe a bug in `update_kv` function

The effect of Clustering via Pooling may be greater？

← Metadata

Owner

Metadata

SnapKV SnapKV copied to clipboard

Metadata

← Metadata

Owner

Metadata

SnapKV
SnapKV copied to clipboard