ShiningMaker comments

Results 5 comments of


                                            ShiningMaker

Quantized QQQ models encountered configuration field exceptions and inference garbled text issues when deployed in vLLM 0.9.1.

> what about other version of vllm? I tested it on vLLM 0.9.1 and vLLM 0.9.2, and both versions encountered the same issue.

[BUG] [CPU Memory OOM] DeekSpeek R1 got os oom-kill when packing model.layers

> [@ShiningMaker](https://github.com/ShiningMaker) I noticed the OOM process had 2TB of memory which includes mmap/disk memory. What is the max cpu memory in your vm or computer instance? 2TB should be...

[BUG] [CPU Memory OOM] DeekSpeek R1 got os oom-kill when packing model.layers

> > I restarted GPTQ to perform int8 quantization. While quantizing the 11/60 layers, I checked the memory information using free -h. My understanding is that 2TB memory should be...

[BUG] [CPU Memory OOM] DeekSpeek R1 got os oom-kill when packing model.layers

@Qubitium Hello, I used the dynamic method to retain the last 10 model.layers without quantization, and at this point, there are no more out-of-memory (OOM) issues occurring. Now, I want...

Quantized QQQ models encountered configuration field exceptions and inference garbled text issues when deployed in vLLM 0.9.1.

> GPTQModel I only tested with vLLM. When using GPTQModel for inference, I encountered the following issues: - After compiling from source, importing PreTrainedModel from transformers fails and causes an...