Huiqiang Jiang

Results 154 comments of Huiqiang Jiang

Hi @XiaoFengbing, I believe it can be approximately achieved, although there might be some differences due to the context-induced condition distribution affecting the question. However, I think the impact will...

Hi @deltawi, if you use the GPTQ 7b model, you will need less than 8GB of GPU memory. Additionally, if you need to use multiple GPUs, you can use the...

Hi @acnagle, thank you for your support of LLMLingua. This is a great question, and I believe other users may have similar queries. The actual compression ratio indeed has a...

Hi @acnagle, Yes, the purpose of iterative compression is to minimize the approximation loss in eq. 5. This approximation can be improved in two ways: 1. By explicitly learning forward...

Hi @choprahetarth, thank you for your interest in and support of LLMLingua. This is a known issue, as seen in #4. We'll address it soon as detailed in #51.

Hi @cws322, in the HF Demo, LLMLingua assigns different compression ratios to different parts of the prompt, like instruction, contexts and question. Therefore, please place the content you wish to...

Hi @TechnotechGit, Thank you for your support with LLMLingua. As mentioned in issue #40, I don't believe there are any significant obstacles to supporting exl2. However, I currently do not...

Hi @TechnotechGit, Thank you for your effort. To my recollection, the attention mask indeed hasn't been utilized, and I think it could be implemented later on. Once again, I appreciate...

Hi @TechnotechGit, I'm deeply sorry for missing your message. Thank you very much for your assistance. The first issue arises because LLMLingua utilizes a KV Cache to avoid recalculating segments...

Hi @radcon00, thank you for your interest in LLMLingua. It seems there might be an issue with the transformers package. Could you please update the transformers package and try again?