Huiqiang Jiang
Huiqiang Jiang
Hi @XiaoFengbing, I believe it can be approximately achieved, although there might be some differences due to the context-induced condition distribution affecting the question. However, I think the impact will...
Hi @deltawi, if you use the GPTQ 7b model, you will need less than 8GB of GPU memory. Additionally, if you need to use multiple GPUs, you can use the...
Hi @acnagle, thank you for your support of LLMLingua. This is a great question, and I believe other users may have similar queries. The actual compression ratio indeed has a...
Hi @acnagle, Yes, the purpose of iterative compression is to minimize the approximation loss in eq. 5. This approximation can be improved in two ways: 1. By explicitly learning forward...
Hi @choprahetarth, thank you for your interest in and support of LLMLingua. This is a known issue, as seen in #4. We'll address it soon as detailed in #51.
Hi @cws322, in the HF Demo, LLMLingua assigns different compression ratios to different parts of the prompt, like instruction, contexts and question. Therefore, please place the content you wish to...
Hi @TechnotechGit, Thank you for your support with LLMLingua. As mentioned in issue #40, I don't believe there are any significant obstacles to supporting exl2. However, I currently do not...
Hi @TechnotechGit, Thank you for your effort. To my recollection, the attention mask indeed hasn't been utilized, and I think it could be implemented later on. Once again, I appreciate...
Hi @TechnotechGit, I'm deeply sorry for missing your message. Thank you very much for your assistance. The first issue arises because LLMLingua utilizes a KV Cache to avoid recalculating segments...
Hi @radcon00, thank you for your interest in LLMLingua. It seems there might be an issue with the transformers package. Could you please update the transformers package and try again?