mlx-swift-examples Qwen3 with heterogenous quant doesn't work

Qwen3 with heterogenous quant doesn't work

Open awni opened this issue 7 months ago • 2 comments

The following model mlx-community/Qwen3-1.7B-4bit-AWQ doesn't run in the mlx-swift-examples repo doesn't run. It fails with a mismatched shape error in the scales. I suspect it's due to the heterogenous quant not being parsed properly. See e.g. https://huggingface.co/mlx-community/Qwen3-1.7B-4bit-AWQ/blob/main/config.json#L20

Apr 29 '25 16:04 awni

Ah yes, the quant code has no idea what to do with that (yet) -- I haven't seen this format before.

Apr 29 '25 16:04 davidkoski

Yea it's relatively new. We use a custom class predicate in mlx-lm which holds the config and reads it to figure out what parameters to use for a given layer.

And in mlx the nn.quantize takes a class predicate which can return either True/Falsse or the quantization parameters.

See e.g.

https://github.com/ml-explore/mlx-lm/blob/main/mlx_lm/utils.py#L201-L208 https://github.com/ml-explore/mlx/blob/main/python/mlx/nn/layers/quantized.py#L29-L34

This is going to be especially useful here because more heterogenous + AWQ quants make a much bigger difference for smaller models.

Apr 29 '25 16:04 awni

mlx-swift-examples mlx-swift-examples copied to clipboard

Qwen3 with heterogenous quant doesn't work

mlx-swift-examples
mlx-swift-examples copied to clipboard