Huiqiang Jiang
Huiqiang Jiang
Hi @radcon00 and @moebius-ansa, this doesn't quite make sense. You can see the definition of the llama's key-value relationship at https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py#L130. Could you check the transformers version in '/lib/python3.10/site-packages/transformers' or...
Hi @JiHa-Kim, Thank you for your support. I suggest referring to [the code of the Hugging Face space demo](https://huggingface.co/spaces/microsoft/LLMLingua/blob/main/app.py) as a reference. You can then build a self-hosted local server...
Hi @JiHa-Kim, thank you for your help and efforts. I haven't tried using GGUF with LLMLingua yet, but I believe there shouldn't be any major block issues. Also, a special...
Hi @JiHa-Kim, currently, calling the llama cpp model may not be supported, or it might require modifying the '__call__' parameter in PromptCompressor.
Hi @growmuye, thank you for your interest in LLMLingua. In the future, we plan to support a new feature that allows users to tag specify tokens that need to be...
Hi @manojsharmadcx, Thank you for your support. The issue arises because the OpenAIGPTLMHeadModel ([link to code](https://github.com/huggingface/transformers/blob/main/src/transformers/models/openai/modeling_openai.py#L533C7-L533C27)) does not support the input of a KV cache. You might consider using "gpt2"...
Hi @manojsharmadcx, yes, currently a local deployment of the corresponding small model is required to use this method. If the API model supports obtaining the log probabilities of the prompt...
Hi @zhyunlong, thank you for your support with LLMLingua. We utilize the same script as 'lost in the middle'. You can access the script at [this link](https://github.com/nelson-liu/lost-in-the-middle/blob/main/scripts/get_qa_responses_from_longchat.py).
Hi @zba, Thank you for your interest and support in LLMLingua. I believe there are no block issues with using the exl2 format. You can try replacing the code at...
Hi @xxSpencer , by default, using LLMLingua requires NVIDIA CUDA to be enabled. You can switch to CPU mode with the following settings. ```python from llmlingua import PromptCompressor llm_lingua =...