GPTQModel icon indicating copy to clipboard operation
GPTQModel copied to clipboard

[BUG]ValueError: Quantization: Failed due to `NaN` loss for `self_attn.q_proj`

Open Juntongkuki opened this issue 6 months ago • 2 comments

Describe the bug

A clear and concise description of what the bug is.

GPU Info

Show output of:

nvidia-smi

Software Info

Operation System/Version + Python Version

Show output of:

pip show gptqmodel torch transformers accelerate triton

If you are reporting an inference bug of a post-quantized model, please post the content of config.json and quantize_config.json.

To Reproduce

How to reproduce this bug if possible.

Expected behavior

A clear and concise description of what you expected to happen.

Model/Datasets

Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.

Screenshots

If applicable, add screenshots to help explain your problem.

Additional context

Add any other context about the problem here.

Juntongkuki avatar Jun 24 '25 12:06 Juntongkuki

@Juntongkuki NaN issue can be result by the following in order of possibility:

  1. Insufficient sampling (calibration) dataset
  2. Model is not correctly (evenly) trained causing some layers/modules to exhibit dominate behavior
  3. GPTQModel code error (least likely)

Qubitium avatar Jun 30 '25 08:06 Qubitium

which gpu are you using @Juntongkuki. I get this error on A100 but not H200.

toncao avatar Jul 23 '25 20:07 toncao