[BUG]ValueError: Quantization: Failed due to `NaN` loss for `self_attn.q_proj`
Describe the bug
A clear and concise description of what the bug is.
GPU Info
Show output of:
nvidia-smi
Software Info
Operation System/Version + Python Version
Show output of:
pip show gptqmodel torch transformers accelerate triton
If you are reporting an inference bug of a post-quantized model, please post the content of config.json and quantize_config.json.
To Reproduce
How to reproduce this bug if possible.
Expected behavior
A clear and concise description of what you expected to happen.
Model/Datasets
Make sure your model/dataset is downloadable (on HF for example) so we can reproduce your issue.
Screenshots
If applicable, add screenshots to help explain your problem.
Additional context
Add any other context about the problem here.
@Juntongkuki NaN issue can be result by the following in order of possibility:
- Insufficient sampling (calibration) dataset
- Model is not correctly (evenly) trained causing some layers/modules to exhibit dominate behavior
- GPTQModel code error (least likely)
which gpu are you using @Juntongkuki. I get this error on A100 but not H200.