ShiningMaker

Results 5 comments of ShiningMaker

> what about other version of vllm? I tested it on vLLM 0.9.1 and vLLM 0.9.2, and both versions encountered the same issue.

> [@ShiningMaker](https://github.com/ShiningMaker) I noticed the OOM process had 2TB of memory which includes mmap/disk memory. What is the max cpu memory in your vm or computer instance? 2TB should be...

> > I restarted GPTQ to perform int8 quantization. While quantizing the 11/60 layers, I checked the memory information using free -h. My understanding is that 2TB memory should be...

@Qubitium Hello, I used the dynamic method to retain the last 10 model.layers without quantization, and at this point, there are no more out-of-memory (OOM) issues occurring. Now, I want...

> GPTQModel I only tested with vLLM. When using GPTQModel for inference, I encountered the following issues: - After compiling from source, importing PreTrainedModel from transformers fails and causes an...