Awni Hannun
Awni Hannun
Oof sorry that’s the compile, you can just remove it for now as in #608
> A potential solution would be to always initialize the lm_head A challenge there is that we load the weights in strict mode so if the `lm_head.weight` is missing then...
I'm okay with that...seems like a small delta and I don't think we lose much from moving it to be a instance method.
> When using the quantized model for inference, it is found that it can no longer perform inference work properly Was this working for you before? Just wondering so we...
Interestingly, converting the 7B works fine so it seems like an issue with the smaller model. Small models tend to be more difficult to quantize in a stable way. I'm...
Actually what model is this: `qwen:1.8b-chat-v1.5-q4_0` ? Is it GGUF?
Yes I found the model. I don't yet know why it's not working, but there are certainly some differences between the GGUF quantized model and generation code. Some examples: -...
> mlx-community/stablelm-2-zephyr-1_6b That looks like a different model, not the qwen model?
Looks like there may be a bug here. We are investigating and will report back.
There was an OOB read in our QMV kernel for the shapes in that QWEN model. This is fixed on main in MLX and I confirmed generating text with quantized...