ShiningMaker
ShiningMaker
First, thank you for your excellent work on this quantization library! I'm encountering two critical issues when deploying a quantized Qwen3-8B model to vLLM 0.9.1: - The initial deployment failed...
First, thank you for your excellent work on this quantization library! I'm encountering two critical issues when deploying a quantized Qwen3-8B model to vLLM 0.9.1: - The initial deployment failed...
**Describe the bug** From my dmesg output, it is evident that the GPTQ Python process (PID 1179327) was killed by the kernel due to the system running out of memory...