mlx
mlx copied to clipboard
[BUG] ValueError: [quantize] The last dimension of the matrix needs to be divisible by the quantization group size 64.
Describe the bug When I try to quantize a VLM model that use SigLIP it throws a value error because it has intermediate size of 4304 which is not divisible by 64 or 128.
To Reproduce
Include code snippet
pip install -U mlx-vlm
python -m mlx_vlm.convert \
--hf-path qnguyen3/nanoLLaVA \
-q
Expected behavior Sucessfully quantize model.
Desktop (please complete the following information):
- OS Version: MacOS 14.4.1
- Version 0.11.1
Additional context Add any other context about the problem here.
Traceback
Traceback (most recent call last):
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/convert.py", line 62, in <module>
main()
File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/convert.py", line 58, in main
convert(**vars(args))
File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/utils.py", line 540, in convert
weights, config = quantize_model(model, config, q_group_size, q_bits)
File "/Users/prince_canuma/Documents/Projects/LLMs/mlx-vlm/mlx_vlm/utils.py", line 452, in quantize_model
nn.quantize(model, q_group_size, q_bits, class_predicate=class_predicate)
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 51, in quantize
leaves = tree_map_with_path(_maybe_quantize, leaves, is_leaf=Module.is_module)
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
return {
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
k: tree_map_with_path(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
return {
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
k: tree_map_with_path(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
return {
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
k: tree_map_with_path(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
return {
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
k: tree_map_with_path(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
return {
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
k: tree_map_with_path(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 87, in tree_map_with_path
return TreeType(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 88, in <genexpr>
tree_map_with_path(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
return {
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
k: tree_map_with_path(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 95, in tree_map_with_path
return {
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 96, in <dictcomp>
k: tree_map_with_path(
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/utils.py", line 83, in tree_map_with_path
return fn(path, tree, *rest)
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 42, in _maybe_quantize
return QuantizedLinear.from_linear(m, group_size, bits)
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 226, in from_linear
ql = cls(input_dims, output_dims, False, group_size, bits)
File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx_code/lib/python3.10/site-packages/mlx/nn/layers/quantized.py", line 185, in __init__
self.weight, self.scales, self.biases = mx.quantize(weight, group_size, bits)
ValueError: [quantize] The last dimension of the matrix needs to be divisible by the quantization group size 64. However the provided matrix has shape (1152,4304)
It's not a bug.. at the risk of being redundant, the last dimension of the matrix has to be divisible by the quantization group size. For the size 4304 there is no supported group size which divides it (e.g. none of 32, 64, 128).
It's not on our roadmap to support irregular sizes... but we can leave this issue open to help prioritize if it's something we should consider in the future.
It can be divided by 16, would an implementation for that be complicated to implement?
It's not a bug.. at the risk of being redundant, the last dimension of the matrix has to be divisible by the quantization group size. For the size 4304 there is no supported group size which divides it (e.g. none of 32, 64, 128).
It's not on our roadmap to support irregular sizes... but we can leave this issue open to help prioritize if it's something we should consider in the future.
Yes, it's not a bug. It's more of a feature request / clarification. Because all SigLip based VLM are not quantisable because of this, which include Idefics 2, NanoLlava and Deepseek VL.
Is there a way to skip particular target layer or Block X in the model in MLX?
Not all layers of the same type like class_predicate does.
You can use class_predicate
for that. Just put the condition you want in the predicate. For example if you are trying to skip weights of a certain shape:
class_predicate = lambda p, m: isinstance(m, nn.Linear) and m.weight != (x, y)
Thank you very much, I will give it a try ASAP!
It works wonders! 💯
Also found a better way, skipping the entire block:
class_predicate = lambda p, m: isinstance(m, nn.Linear) and p.split('.')[0] != "vision_tower"
I am going to close this. If people are interested in supporting irregular sizes we can open a new issue (eg with padding and slicing behind the scenes) .