LLMLingua
LLMLingua copied to clipboard
To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.
### Describe the issue According to the case, there may be connection exceptions during instantiation, and I have not found any relevant configuration information. Can you help me? from llmlingua...
### Describe the issue  in the paper,as we can see:LLMLingua-2 is designed for question agnostic compression, it can also be integrated with LongLLMLingua to preserve more key information relevant...
### Describe the issue I want to compressed prompt which is in markdown format which contains images and their links or contain any other website link. **This is what I...
### Describe the bug The original regular expression using `([^)?)*?)` . Commit on my fork: 73baf3f  ### Steps to reproduce Confronting pattern matching: ```python from llmlingua import PromptCompressor llm_lingua...
# What does this PR do? This PR modifies the code to store retrieval in `res_pt` instead of `kept`, addressing potential issues that could arise during data verification. This change...
### Describe the issue @pzs19 I would like to reproduce and expand the end2end latency benchmark results of the LLMLingua-2 paper and was therefore wondering if you could provide more...
### Describe the issue 作者您好,请问在使用LongLLMLingua生成摘要时,instruction、document、question应该怎么设置?问题感知粗粒度压缩、问题感知细粒度压缩、分别是怎么执行的呢? 期待您的回复!
### Describe the issue I noticed that in `PromptCompressor.get_compressed_input`, there is a variable called `need_idx`. Is that all I need?
### Describe the issue I would like to know whether all the results in the LLMLingua2 paper used force_tokens. I noticed that the repository has different force_tokens settings for different...
### Describe the issue Hi, Playing with the prompt compression, I am trying to obtain the exact probability/importance score for each word. However, it seems that LLMLingua does not save...