LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

To speed up LLMs' inference and enhance LLM's perceive of key information, compress the prompt and KV-Cache, which achieves up to 20x compression with minimal performance loss.

Results 117 LLMLingua issues
Sort by recently updated
recently updated
newest added

our model deployed on alibaba modelscope,in our country,we cannot connect to huggingface is there any method to use other remote model?

# What does this PR do? This PR adds minimal support for Japanese prompt tokenization. ### What's included: - A Japanese tokenizer utility (`tokenize_jp`) using [fugashi](https://github.com/polm/fugashi) + unidic-lite - A...

### Describe the issue (qwen25) root@n-217:/data/wen/test# python /data/wen/test/llmlingua-2/test.py Traceback (most recent call last): File "/data/wen/test/llmlingua-2/test.py", line 11, in results = compressor.compress_prompt_llmlingua2( File "/root/anaconda3/envs/qwen25/lib/python3.10/site-packages/llmlingua/prompt_compressor.py", line 926, in compress_prompt_llmlingua2 compressed_context, word_list, word_label_list...

question

# What does this PR do? This PR introduces **TACO-RL (Task-Aware Prompt Compression Optimization with Reinforcement Learning)**, a new submodule that extends LLMLingua with reinforcement learning capabilities for fine-tuning pre-trained...

### Describe the issue Context length of the model used in the main page is not big enough for me to do compression. Need to know it if support big...

question

Hi @iofu728 🤗 I'm Niels and work as part of the open-source team at Hugging Face. I discovered your work on Arxiv and was wondering whether you would like to...

feature