Huiqiang Jiang comments

Results 154 comments of


                                            Huiqiang Jiang

Some questions about parameters?

Hi @XiaoFengbing, I believe it can be approximately achieved, although there might be some differences due to the context-induced condition distribution affecting the question. However, I think the impact will...

CUDA out of memory

Hi @deltawi, if you use the GPTQ 7b model, you will need less than 8GB of GPU memory. Additionally, if you need to use multiple GPUs, you can use the...

Understanding the interplay between `ratio` and `iterative_size`

Hi @acnagle, thank you for your support of LLMLingua. This is a great question, and I believe other users may have similar queries. The actual compression ratio indeed has a...

Understanding the interplay between `ratio` and `iterative_size`

Hi @acnagle, Yes, the purpose of iterative compression is to minimize the approximation loss in eq. 5. This approximation can be improved in two ways: 1. By explicitly learning forward...

Output for High Token Languages like Japanese

Hi @choprahetarth, thank you for your interest in and support of LLMLingua. This is a known issue, as seen in #4. We'll address it soon as detailed in #51.

Failed Compression Attempts with LLMLingua Web UI Demo

Hi @cws322, in the HF Demo, LLMLingua assigns different compression ratios to different parts of the prompt, like instruction, contexts and question. Therefore, please place the content you wish to...

Support for llama.cpp or exl2

Hi @TechnotechGit, Thank you for your support with LLMLingua. As mentioned in issue #40, I don't believe there are any significant obstacles to supporting exl2. However, I currently do not...

Support for llama.cpp or exl2

Hi @TechnotechGit, Thank you for your effort. To my recollection, the attention mask indeed hasn't been utilized, and I think it could be implemented later on. Once again, I appreciate...

Support for llama.cpp or exl2

Hi @TechnotechGit, I'm deeply sorry for missing your message. Thank you very much for your assistance. The first issue arises because LLMLingua utilizes a KV Cache to avoid recalculating segments...

keyError 'llama' when trying to running PromptCompressor()

Hi @radcon00, thank you for your interest in LLMLingua. It seems there might be an issue with the transformers package. Could you please update the transformers package and try again?