GPTQModel Sidestep NaN caused by div by zero where calculated scale == 0

I'm trying to quantize the Qwen2-57B-A14B model using the following configuration:

pretrained_model_id = "/mnt/82_store/xj/modelzoo/qwen/Qwen/Qwen2-57B-A14B"
quantized_model_id = "/mnt/82_store/xj/GPTQModel/quantmodel/Qwen2-57B-A14B-2bit-128g-c4

 calibration_dataset = load_dataset(
     "allenai/c4",
     data_files="en/c4-train.00001-of-01024.json.gz",
     split="train"
   ).select(range(1024))["text"]

quantize_config = QuantizeConfig(
    bits=2,
    group_size=128,
    dynamic={
         r".*mlp\.shared_expert.*": { "bits": 8, "group_size": 128 },
         r".*mlp\.experts.*down_proj.*": { "bits": 8, "group_size": 128 },
     }
)

model = GPTQModel.load(pretrained_model_id, quantize_config)
model.quantize(calibration_dataset, auto_gc=False, batch_size=4)
model.save(quantized_model_id)

it dont work :(

However, the quantization fails with NaN errors. After tracing the issue, I found that it originates from this function during weight quantization:

def quantize(x, scale, zero, maxq, requires_groupwise_processing: bool):
    if maxq < 0:
        return (x > scale / 2).float() * scale + (x < zero / 2).float() * zero
    if requires_groupwise_processing:
        q = torch.clamp(torch.round(x / scale), -maxq, maxq)
        return scale * q
    else:
        q = torch.clamp(torch.round(x / scale) + zero, 0, maxq)
        return scale * (q - zero)

The problem occurs when scale == 0, which leads to division by zero and causes NaN. I temporarily fixed it with this patch:

def quantize(x, scale, zero, maxq, requires_groupwise_processing: bool):
    if maxq < 0:
        return (x > scale / 2).float() * scale + (x < zero / 2).float() * zero
    if requires_groupwise_processing:
        scale = torch.where(scale == 0, torch.tensor(1e-8, device=scale.device), scale)  # 防止除以零
        q = torch.clamp(torch.round(x / scale), -maxq, maxq)
        return scale * q
    else:
        scale = torch.where(scale == 0, torch.tensor(1e-8, device=scale.device), scale)  # 防止除以零
        q = torch.clamp(torch.round(x / scale) + zero, 0, maxq)
        return scale * (q - zero)

With this change, the quantization works well : )

Is this a valid fix, or is there a better or more correct way to handle nan loss during quantization?

Apr 10 '25 02:04 ywlq

@ywlq Interesting. I see that intead of 0, you just give a very very small, close to 0 value! Nice work around. Submit a PR and we can do more tests on this. I find the NaN is mostly cased by unstable models where some layers have anormally max/min outliers. This is MoE so not surprising. Some of the individual moe layers may not be fully/properly trained.

Apr 10 '25 03:04 Qubitium

@Qubitium By the way, does GPTQModel now support using EORA on DeepSeek-v2-lite and Qwen2MOE? I’ve been trying to quantize these models to 2 bits, but the performance hasn’t been satisfactory. I was wondering if EORA could potentially improve the performance.

Apr 10 '25 09:04 ywlq

@Qubitium By the way, does GPTQModel now support using EORA on DeepSeek-v2-lite and Qwen2MOE? I’ve been trying to quantize these models to 2 bits, but the performance hasn’t been satisfactory. I was wondering if EORA could potentially improve the performance.

Do use 3bits. 2bits currently under GPTQ is unusable. Eora with 3bits can bring it to 4bit quality depending on model and calibration and some bit of luck. And yes, EoRA will work ALL models that GPTQModel supports.

Can you update the PR with changes so I can merge? Thanks.

Apr 10 '25 09:04 Qubitium

@Qubitium

Merging is blocked
1 review requesting changes by reviewers with write access.
You're not authorized to push to this branch. Visit https://docs.github.com/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches for more information.

I am working on this but encounted some problem, it is my first Pr, so i dont konw how to solve it :(

Should I close the pull request directly ?

Apr 10 '25 11:04 ywlq

@ywlq This is your forked branch, so you should have full access. Did you switch env or github login cred was not saved?

Apr 10 '25 11:04 Qubitium

@ywlq Did you checkout the wrong branch? U need to push to the same branch at your fork. There is no way i can restrict this. Ask ai for help for git branches and pushes.

Apr 10 '25 11:04 Qubitium

@Qubitium I had delete the branch , so sorry to bother you

Apr 10 '25 12:04 ywlq

@ywlq You can create a new branch and PR? Or did you find a bug in the code?

Apr 10 '25 12:04 Qubitium

@ywlq You do not want to create a new PR? I think it's a valid fix to avoid catastrophic quantization errors on some models that are not stable and contains layers with extreme outliers.

Apr 11 '25 13:04 Qubitium

@Qubitium You not quite sure if it is correct , so i did not create a new PR . I hope you can verify if it is useful .

Apr 13 '25 03:04 ywlq

@Qubitium You not quite sure if it is correct , so i did not create a new PR . I hope you can verify if it is useful .

I think my PR review got lost in translation or your english to chinese translation is wrong. =)

What I said is, please change your code so it is better. Look at the PR notes and read it carefuly. I think

It is good and useful.
but a) the code needs to the go outside the if/else and add 0 check before modifying the scales and log to user that this module has scale modified post quantization due to module instability.

Apr 13 '25 04:04 Qubitium

@ywlq

Do not resolve this comment. This is very import and I want you to make this change so I can merge the PR. If this is the first time you PR, I am not rejecting your PR, I am rejecting this version of PR. Note the difference. Make changes, push, and I will review/approve.

https://github.com/ModelCloud/GPTQModel/pull/1531#discussion_r2036476276

Apr 13 '25 04:04 Qubitium