Huiqiang Jiang comments

Results 154 comments of


                                            Huiqiang Jiang

keyError 'llama' when trying to running PromptCompressor()

Hi @radcon00 and @moebius-ansa, this doesn't quite make sense. You can see the definition of the llama's key-value relationship at https://github.com/huggingface/transformers/blob/main/src/transformers/models/auto/configuration_auto.py#L130. Could you check the transformers version in '/lib/python3.10/site-packages/transformers' or...

How to setup LLMLingua with localhost?

Hi @JiHa-Kim, Thank you for your support. I suggest referring to [the code of the Hugging Face space demo](https://huggingface.co/spaces/microsoft/LLMLingua/blob/main/app.py) as a reference. You can then build a self-hosted local server...

How to setup LLMLingua with localhost?

Hi @JiHa-Kim, thank you for your help and efforts. I haven't tried using GGUF with LLMLingua yet, but I believe there shouldn't be any major block issues. Also, a special...

How to setup LLMLingua with localhost?

Hi @JiHa-Kim, currently, calling the llama cpp model may not be supported, or it might require modifying the '__call__' parameter in PromptCompressor.

Add workspaces support to HTTP backend

Hi @growmuye, thank you for your interest in LLMLingua. In the future, we plan to support a new feature that allows users to tag specify tokens that need to be...

PromptCompressor error - OpenAIGPTLMHeadModel.forward() got an unexpected keyword argument 'past_key_values'

Hi @manojsharmadcx, Thank you for your support. The issue arises because the OpenAIGPTLMHeadModel ([link to code](https://github.com/huggingface/transformers/blob/main/src/transformers/models/openai/modeling_openai.py#L533C7-L533C27)) does not support the input of a KV cache. You might consider using "gpt2"...

PromptCompressor error - OpenAIGPTLMHeadModel.forward() got an unexpected keyword argument 'past_key_values'

Hi @manojsharmadcx, yes, currently a local deployment of the corresponding small model is required to use this method. If the API model supports obtaining the log probabilities of the prompt...

The script for LongChat to reproduce the LongLLMLingua

Hi @zhyunlong, thank you for your support with LLMLingua. We utilize the same script as 'lost in the middle'. You can access the script at [this link](https://github.com/nelson-liu/lost-in-the-middle/blob/main/scripts/get_qa_responses_from_longchat.py).

use other quant formats

Hi @zba, Thank you for your interest and support in LLMLingua. I believe there are no block issues with using the exl2 format. You can try replacing the code at...

compress_prompt Reports Error: AttributeError: 'NoneType' object has no attribute 'device'

Hi @xxSpencer , by default, using LLMLingua requires NVIDIA CUDA to be enabled. You can switch to CPU mode with the following settings. ```python from llmlingua import PromptCompressor llm_lingua =...