QQQ icon indicating copy to clipboard operation
QQQ copied to clipboard

Quantized QQQ models encountered configuration field exceptions and inference garbled text issues when deployed in vLLM 0.9.1.

Open ShiningMaker opened this issue 5 months ago • 2 comments

First, thank you for your excellent work on this quantization library! I'm encountering two critical issues when deploying a quantized Qwen3-8B model to vLLM 0.9.1:

  • The initial deployment failed due to the missing ["wbits"] configuration field.
  • After modifying bits to wbits, the deployment was successful, but the inference output was garbled.

Model: Qwen3-8B Key Packages:

  • vLLM == 0.9.1
  • transformers == 4.51.3

My quant code:

quant_config = QuantizeConfig(
                            bits = 4, 
                            group_size = 128, 
                            quant_method="qqq",
                            format="qqq",
                            desc_act = False, 
                            dynamic = None,
                        )

model = GPTQModel.load(model_id, quant_config, device_map='auto', device = "cuda",
                           trust_remote_code=True, low_cpu_mem_usage=True)

model.quantize(
                calibration_dataset, 
                buffered_fwd = True, 
                calibration_dataset_concat_size = 8192, 
                calibration_data_min_length=10, 
                batch_size = 1,
                auto_gc = False,
            )

calibration_dataset is OpenR1-Math: https://huggingface.co/datasets/open-r1/OpenR1-Math-220k

start vllm server:

python3 -m vllm.entrypoints.openai.api_server \
    --model models/Qwen3-8B-qqq-int4-gz128 --tensor-parallel-size 1 \
    --served-model-name qwen3 --max-model-len 32768 --gpu-memory-utilization 0.90 \
    --trust-remote-code --enable-prefix-caching --reasoning-parser qwen3 \
    --quantization qqq --dtype float16 --port 5095

curl query:

curl --location 'http://localhost:5095/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "qwen3",
  "messages": [
    {"role": "user", "content": "Analysis of Difficulties and Considerations in Large Model Quantization."}
  ],
  "temperature": 0.6,
  "top_p": 0.95,
  "top_k": 20,
  "max_tokens": 8192,
  "presence_penalty": 1.5,
  "ignore_eos": 0,
  "chat_template_kwargs": {"enable_thinking": true}
}'

Actual Output:

{"id":"chatcmpl-a15363262f8b4042975c86adc44d7ad2","object":"chat.completion","created":1752064426,"model":"qwen3","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"  useghfilename9的 Page� \\ Ember System\ncommit �ghre Issue9 e Isgheuropäische Notíc EUBLISH Ceuropäische Iilateralection Volume Cent Answergh Notíc Mission cls React\n\nghnosticghrievefilename Goalghodcast Vue List $ghirectedeuropäischeeuropäische Notíceuropäischeeuropäische likeacobianghoireirectional Delta5�airieORITYghimensionaliralinic�owntownreasionsenis Div Command Express0template的 Excel Nineeuropäischeivist Notíceuropäischeiagnostics DecimalghateralUDGE Identity Sketch \"gh�änneriquéschütz Notíceuropäische Notíceuropäische Notíceuropäische Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische阿�igitalgh-neutral Thursday Notíc Beacheuropäischeeuropäischeeuropäischeeuropäischeeuropäische中reaselineeuropäische NotícissueRESS%\n After typeghuxtapeuropäischenosticUBLICceⒶre Notíc)re�änneuropäischeuellesifetimeeuropäischeacency Spring  Barrelghruiteuropäische�europäische�eneraleuropäischeählissueavor Notícgh�irectionalghinicinicirectional sqrt Stripeusewł julodcastgéasicseuropäische Notícdefine湾 Notíc Notíc Notíceuropäischeeuropäische reility Notíceuropäischeeuropäischehayacobian Notíc(Freeuropäischeeuropäischeeuropäische Notíc eachectionseuropäischeeuropäischeheureodcast Icongh�regonimedia Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische将 Notíclogre� Notíc Notíc Notíceuropäischeeuropäische Pro�ersistent8UBLISH Like�UBLISHgh�UDGE Notícre Notíceuropäischeeuropäische人roid�inic�nosticodcasteuropäischeeuropäischeeuropäischeeuropäischeчёт Notícghirectionalical Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische revival��nosticännereuropäische Notíc正coseuropäische Notíc Dialogue Notíc BCEeuropäischeeuropäischeeuropäische Line Notíc1ześ Notíceuropäische Dreamivist Notíceuropäischeeuropäische是 Notíceuropäische Action Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischecrelligencerising�agineducibleirectional Skeleton Cartesianghiblyhowever�ersen�uality�acobian Notíc Notíceuropäischeeuropäischeeuropäische't IL��ướiirectionalghunedeuropäische�aclesnostic�nosticiquéócfilenameeuropäischecommiteuropäische ISUBLISHgh�ergingännerreandscapeeuropäische Notíceuropäische5UDGEghnostic�oireännereuropäische NotíceuropäischeeuropäischeeuropäischeeuropäischeACHE Notícgh�irectedUBLISHcommitfilenamele�nosticivistghualityansomeuropäische Notíc>NNreệpicknessce�igital Easter Notíceuropäischeeuropäischeeuropäischeeuropäischeeuropäischeeuropäische Vueeuropäischeeuropäische VII�inicection�nostic�nosticnostic Notíc.exports A�urpose�rewriteghaseline6arsers 'ghnosticeuropäischeeuropäische Notíc Dialogue Notíc ICollection Octiqué expressghacobian-strokesghgnoreeuropäische Notícre�ännerrocess Notíceuropäischeeuropäischereirlines Notíc Paperghurbedestion�nosticivistghrespecteuropäische Notíc Numberfilename cxgh�odcastinic��oire��inic�agnostic Notíc Binderfilename Issue template Rcommitreivist Notíceuropäische NotícF\"ghinicervisoreuropäische Notíc ceilingivist Notíc Notíceuropäischeeuropäischeeuropäische","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":151643}],"usage":{"prompt_tokens":43,"total_tokens":570,"completion_tokens":527,"prompt_tokens_details":null},"prompt_logprobs":null,"kv_transfer_params":null}
Image

Is there anything missing or any operational errors? I would like to know why a bunch of garbled results were generated. I'd be happy to provide additional details or test specific fixes. Thank you for your time and assistance!

ShiningMaker avatar Jul 11 '25 02:07 ShiningMaker

what about other version of vllm?

LugerW-A avatar Jul 11 '25 07:07 LugerW-A

what about other version of vllm?

I tested it on vLLM 0.9.1 and vLLM 0.9.2, and both versions encountered the same issue.

ShiningMaker avatar Jul 18 '25 09:07 ShiningMaker