PQCache issues

CUDA out of memory.

非常出色的工作！您在文中4.1.4章节提到实验GPU配置是一张RTX 4090 24GB，我使用llama3.1测试longbench，在最开始的narrativeqa任务就出现OOM问题。您提供的run_llama.sh文件中device设置为0和1，我只设置了device为0，在单张4090进行实验，其他设置均和sh文件保持一致。 ``` Traceback (most recent call last): File "/root/PQCache/vq_pred.py", line 463, in get_pred(args, model, tokenizer, 0, world_size, data_all, max_length, max_gen, File "/root/PQCache/vq_pred.py", line 178, in get_pred output =...

TreasureHunter

关于聚类的问题

1

![Image](https://github.com/user-attachments/assets/1ea01749-c936-4ff0-a66d-c3eb27dda531) 您好我一直出这个warning 这个合理吗我想知道

JayLZhou

Question about latency test

1

I want to measure the latency, but the code doesn't seem to provide the nah_input.jsonl. So, how can I use the test_latency.py? Thanks!

strong-leaf

关于128k长度的上下文

3

您好，请问您尝试过更长的文本吗？比如128k。当我尝试128k的上下文时，将一直停留在prefill阶段的kmeans阶段，这是否超过了cpu负载？如果您可以提供相关的解决方案，我将不胜感激！

ydyhello

[BUG] MAX_CPU_IN_USE=48 will cause error

For my own env this parameter will cause issues running it, maybe an instruction on how to correctly set the num of cpus is required? I haven't test too many...

zwei2025

关于 InfiniteBench 测试集 pq结果与论文差异的咨询

您好，我在复现您论文中的实验时遇到了一些问题，想向您请教一下。我使用了 pq方法在 InfiniteBench 测试集上进行评测，但结果与论文中报告的数值差异较大。例如，在 Retr.PassKey 任务上，论文中的得分为 100，而我复现的结果仅为 37.63。我的主要实验参数如下：压缩率（compression ratio）：0.1 recent 比例：0.2 其他参数均为默认配置。另外，我在相同配置下运行 full attention 方法时，结果表现正常，因此我怀疑可能是 InfiniteBench 相关代码或配置有所不同。请问您是否方便提供 InfiniteBench 实验部分的代码或者复现实验的具体说明？非常感谢您的时间与帮助！

yuanjiayu1234

Missing Results on Mistral-7B-inst-v0.2

The paper mentions that the experiment results on Mistral-7B-inst-v0.2 are in Appendix A, but the appendix is not attached in the paper. Could you please update the results or show...

Miko2333

LLama2

1

我按照env.yml进行了环境配置（将torch中的cu121均改为118）之后执行bash run_mistral.sh出现Traceback (most recent call last): File "/home/xuhan/KV_Cache/PQCache/vq_pred.py", line 11, in from vq_method.llama_patch import VQLlamaForCausalLM File "/home/xuhan/KV_Cache/PQCache/vq_method/llama_patch.py", line 22, in from .retrieval_based.pq_search import * File "/home/xuhan/KV_Cache/PQCache/vq_method/retrieval_based/pq_search.py", line 10, in from...

XuHan0920

GPU memory usage seems abnormal

2

In paper, 4.1.4 Hardware Environment and Hyperparameters, Unless otherwise specified, for each experiment, we use an NVIDIA GeForce RTX 4090 24GB card for GPU computation, two Intel(R) Xeon(R) Gold 6330...

wagnzi

PQCache
PQCache copied to clipboard

Metadata

CUDA out of memory.

关于聚类的问题

Question about latency test

关于128k长度的上下文

[BUG] MAX_CPU_IN_USE=48 will cause error

关于 InfiniteBench 测试集 pq结果与论文差异的咨询

Missing Results on Mistral-7B-inst-v0.2

LLama2

GPU memory usage seems abnormal

← Metadata

Owner

Metadata

PQCache PQCache copied to clipboard

Metadata

← Metadata

Owner

Metadata

PQCache
PQCache copied to clipboard