LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Results 117 LLMLingua issues
Sort by recently updated
recently updated
newest added

I use Qwen-7B ,then get error: Traceback (most recent call last): File "/qwen/test.py", line 22, in compressed_prompt = llm_lingua.compress_prompt( File "/usr/local/lib/python3.10/dist-packages/llmlingua/prompt_compressor.py", line 253, in compress_prompt context = self.iterative_compress_prompt( File "/usr/local/lib/python3.10/dist-packages/llmlingua/prompt_compressor.py",...

question

Hi, Thanks for this amazing piece of work. I was trying to use this framework to compress a prompt, which has a dialogue between two people as context & I...

First of all, thank you for this fantastic project. I was wondering if there are any parameters that help with the speed of the compression, currently using TheBloke/Llama-2-7b-Chat-GPTQ but seems...

question

Estimated Release Date: 2/5 Release Manager: @suiguoxin Schedule: - Design Review: 1/19 - Coding: 1/26 - Testing: 2/2 ## Features - [x] P0 Feature Planning @iofu728 @lunaqiu ETA: 1.16 -...

iteration plan

Thank you for the interesting work, and making the code easily accessible. I have some confusion on the relationship between the `ratio` and `iterative_size` parameters. In the case I am...

question

I conducted several prompt compression tests using the LLMLingua Web UI Demo (https://huggingface.co/spaces/microsoft/LLMLingua ). However, I encountered a situation where the context could not be compressed, despite testing with various...

question

While the concept is promising, especially for High Token Languages like Japanese, I've encountered a significant encoding issue. Steps to Reproduce: Input a Japanese text prompt into LLMLingua for compression....

bug

I have 4 GPUs RTX A5000 with 24GB memory each, but when I run the example code: ```python from llmlingua import PromptCompressor llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"}) ``` I get...

question

Hi team, Due to computing resources needed to run this, it would be nice if you can also add an option where user can give `url_endpoint` and `api_key` for a...

feature request

Hi, this is an interesting project. I would like to use this with llama.cpp (llama-cpp-python more specifically), but when I had a look at the code I wasn't able to...

feature request