Support quantization for non-multiples of 32.
Yayi 30b k,v layer has [input_dims=7168, out_dims=112], so it failed to quantize due to error all dimensions should be divisible by 32 for now. FYI, here is the implementation of the yayi2 30b model's k,v layer -> https://huggingface.co/wenge-research/yayi2-30b/blob/main/modeling_yayi.py#L180
Yes that's a known limitation of our quantization at the moment. Will mark this as an ehnancment.
In the meantime one thing you could do as a workaround so you aren't blocked by this is to concatenate the K and V projections and do them as a single matmul (112 * 2 / 32 = 7).
Yes that's a known limitation of our quantization at the moment. Will mark this as an ehnancment.
In the meantime one thing you could do as a workaround so you aren't blocked by this is to concatenate the K and V projections and do them as a single matmul (112 * 2 / 32 = 7).
Hi Awni, thanks for the reply. Would you be able to give a bit more detail about the workaround? By the way, happy new year! :)
Yes so there are two projections, one for the keys and one for the values.
k = x @ Wk.T
v = x @ Wv.T
Instead you could do something like:
k, v = mx.split(x @ mx.concatenate([Wk, Wv], axis=0).T, 2)
Now you can pre compute the concatenated matrix and quantize it since its dimensions should be divisible by 32.
Thanks @awni , this workaround works. I have added it to the yayi2 example. :)
Nice.
I think (at least for the non-group axis) fixing this is mostly a matter of bounds checking in the matmul.. but @angeloskath can say more about if / when we could support more flexible quantization.
Aslo https://github.com/ml-explore/mlx-examples/issues/279
@mzbac not sure you saw, in 0.0.10 this should be fixed. We can add yayi2 to mlx-lm now 😃
@mzbac not sure you saw, in 0.0.10 this should be fixed. We can add yayi2 to mlx-lm now 😃
Nice, I will take a look. Hopefully, it just needs to be mapped to the llama arch and it will work.
Oh that would be amazing.
Please close this now as MLX supports non-32 dims quantization and I have tested it on the yayi2 model, and it works as expected.