LLMLingua issues

RuntimeError: The expanded size of the tensor (181) must match the existing size (211) at non-singleton dimension 0

2

I use Qwen-7B ,then get error: Traceback (most recent call last): File "/qwen/test.py", line 22, in compressed_prompt = llm_lingua.compress_prompt( File "/usr/local/lib/python3.10/dist-packages/llmlingua/prompt_compressor.py", line 253, in compress_prompt context = self.iterative_compress_prompt( File "/usr/local/lib/python3.10/dist-packages/llmlingua/prompt_compressor.py",...

kofuya

question

Params to use for compressing Dialogues

Hi, Thanks for this amazing piece of work. I was trying to use this framework to compress a prompt, which has a dialogue between two people as context & I...

vikram71198

Speed Up Compression

6

First of all, thank you for this fantastic project. I was wondering if there are any parameters that help with the speed of the compression, currently using TheBloke/Llama-2-7b-Chat-GPTQ but seems...

pathquester

question

version 0.2 iteration plan

Estimated Release Date: 2/5 Release Manager: @suiguoxin Schedule: - Design Review: 1/19 - Coding: 1/26 - Testing: 2/2 ## Features - [x] P0 Feature Planning @iofu728 @lunaqiu ETA: 1.16 -...

mydmdm

iteration plan

Understanding the interplay between `ratio` and `iterative_size`

1

Thank you for the interesting work, and making the code easily accessible. I have some confusion on the relationship between the `ratio` and `iterative_size` parameters. In the case I am...

acnagle

question

Failed Compression Attempts with LLMLingua Web UI Demo

1

I conducted several prompt compression tests using the LLMLingua Web UI Demo (https://huggingface.co/spaces/microsoft/LLMLingua ). However, I encountered a situation where the context could not be compressed, despite testing with various...

cws322

question

Output for High Token Languages like Japanese

2

While the concept is promising, especially for High Token Languages like Japanese, I've encountered a significant encoding issue. Steps to Reproduce: Input a Japanese text prompt into LLMLingua for compression....

choprahetarth

bug

CUDA out of memory

2

I have 4 GPUs RTX A5000 with 24GB memory each, but when I run the example code: ```python from llmlingua import PromptCompressor llm_lingua = PromptCompressor("TheBloke/Llama-2-7b-Chat-GPTQ", model_config={"revision": "main"}) ``` I get...

deltawi

question

Support for remote LLM through API

4

Hi team, Due to computing resources needed to run this, it would be nice if you can also add an option where user can give `url_endpoint` and `api_key` for a...

deltawi

feature request

Support for llama.cpp or exl2

5

Hi, this is an interesting project. I would like to use this with llama.cpp (llama-cpp-python more specifically), but when I had a look at the code I wasn't able to...

TechnotechGit

feature request

LLMLingua
LLMLingua copied to clipboard

Metadata

RuntimeError: The expanded size of the tensor (181) must match the existing size (211) at non-singleton dimension 0

Params to use for compressing Dialogues

Speed Up Compression

version 0.2 iteration plan

Understanding the interplay between `ratio` and `iterative_size`

Failed Compression Attempts with LLMLingua Web UI Demo

Output for High Token Languages like Japanese

CUDA out of memory

Support for remote LLM through API

Support for llama.cpp or exl2

← Metadata

Owner

Metadata

LLMLingua LLMLingua copied to clipboard

Metadata

← Metadata

Owner

Metadata

LLMLingua
LLMLingua copied to clipboard