Juntongkuki comments

Repositories
Issues
Comments

Results 2 comments of


                                            Juntongkuki

Qwen 2.5 Quantization is slower than fp16 with vLLM

> update: removing `quantization="AWQ"` (per [this link](https://github.com/vllm-project/vllm/issues/6985)) seems to speed it up, but still slower than FP16. I have the same problem.

[BUG] ValueError: Quantization: Failed due to NaN loss

> [@it-dainb](https://github.com/it-dainb) Did you resolve this error? I also have the same problem and the loss is too big..... INFO ------------------------------------------------------------------------------------------------------------------------------------------------------ INFO | process | layer | module | loss...