mlx [BUG] ValueError: [quantize] The last dimension of the matrix needs to be divisible by the quantization group size 64.

Describe the bug When I try to quantize a VLM model that use SigLIP it throws a value error because it has intermediate size of 4304 which is not divisible by 64 or 128.

To Reproduce

Include code snippet

pip install -U mlx-vlm

python -m mlx_vlm.convert \
    --hf-path qnguyen3/nanoLLaVA \
    -q

Expected behavior Sucessfully quantize model.

Desktop (please complete the following information):

OS Version: MacOS 14.4.1
Version 0.11.1

Additional context Add any other context about the problem here.

Traceback

Traceback (most recent call last):
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/convert.py", line 62, in <module>
    main()
  File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/convert.py", line 58, in main
    convert(**vars(args))
  File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/utils.py", line 540, in convert
    weights, config = quantize_model(model, config, q_group_size, q_bits)
  File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/utils.py", line 452, in quantize_model
    nn.quantize(model, q_group_size, q_bits, class_predicate=class_predicate)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 51, in quantize
    leaves = tree_map_with_path(_maybe_quantize, leaves, is_leaf=Module.is_module)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 87, in tree_map_with_path
    return TreeType(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 88, in <genexpr>
    tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
    return {
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
    k: tree_map_with_path(
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 83, in tree_map_with_path
    return fn(path, tree, *rest)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 42, in _maybe_quantize
    return QuantizedLinear.from_linear(m, group_size, bits)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 226, in from_linear
    ql = cls(input_dims, output_dims, False, group_size, bits)
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 185, in __init__
    self.weight, self.scales, self.biases = mx.quantize(weight, group_size, bits)
ValueError: [quantize] The last dimension of the matrix needs to be divisible by the quantization group size 64. However the provided  matrix has shape (1152,4304)

Apr 25 '24 20:04 Blaizzy

It's not a bug.. at the risk of being redundant, the last dimension of the matrix has to be divisible by the quantization group size. For the size 4304 there is no supported group size which divides it (e.g. none of 32, 64, 128).

It's not on our roadmap to support irregular sizes... but we can leave this issue open to help prioritize if it's something we should consider in the future.

Apr 25 '24 20:04 awni

It can be divided by 16, would an implementation for that be complicated to implement?

Apr 25 '24 21:04 s-smits

It's not a bug.. at the risk of being redundant, the last dimension of the matrix has to be divisible by the quantization group size. For the size 4304 there is no supported group size which divides it (e.g. none of 32, 64, 128).

It's not on our roadmap to support irregular sizes... but we can leave this issue open to help prioritize if it's something we should consider in the future.

Yes, it's not a bug. It's more of a feature request / clarification. Because all SigLip based VLM are not quantisable because of this, which include Idefics 2, NanoLlava and Deepseek VL.

Apr 25 '24 22:04 Blaizzy

Is there a way to skip particular target layer or Block X in the model in MLX?

Not all layers of the same type like class_predicate does.

Apr 25 '24 22:04 Blaizzy

You can use class_predicate for that. Just put the condition you want in the predicate. For example if you are trying to skip weights of a certain shape:

class_predicate = lambda p, m: isinstance(m, nn.Linear) and m.weight != (x, y)

Apr 25 '24 23:04 awni

Thank you very much, I will give it a try ASAP!

Apr 25 '24 23:04 Blaizzy

It works wonders! 💯

Also found a better way, skipping the entire block:

class_predicate = lambda p, m: isinstance(m, nn.Linear) and p.split('.')[0] != "vision_tower"

Apr 25 '24 23:04 Blaizzy

I am going to close this. If people are interested in supporting irregular sizes we can open a new issue (eg with padding and slicing behind the scenes) .

Aug 08 '24 21:08 angeloskath

mlx mlx copied to clipboard

[BUG] ValueError: [quantize] The last dimension of the matrix needs to be divisible by the quantization group size 64.

mlx
mlx copied to clipboard