GPTQModel [BUG]ValueError: Quantization: Failed due to `NaN` loss for `self_attn.q

Describe the bug

A clear and concise description of what the bug is.

GPU Info

Show output of:

nvidia-smi

Software Info

Operation System/Version + Python Version

Show output of:

pip show gptqmodel torch transformers accelerate triton

If you are reporting an inference bug of a post-quantized model, please post the content of config.json and quantize_config.json.

To Reproduce

How to reproduce this bug if possible.

Expected behavior

A clear and concise description of what you expected to happen.

Model/Datasets

Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Jun 24 '25 12:06 Juntongkuki

@Juntongkuki NaN issue can be result by the following in order of possibility:

Insufficient sampling (calibration) dataset
Model is not correctly (evenly) trained causing some layers/modules to exhibit dominate behavior
GPTQModel code error (least likely)

Jun 30 '25 08:06 Qubitium

which gpu are you using @Juntongkuki. I get this error on A100 but not H200.

Jul 23 '25 20:07 toncao

[BUG]ValueError: Quantization: Failed due to `NaN` loss for `self_attn.q_proj`