Prince Canuma
Prince Canuma
Hey @cemde I've discussed it with the team and decided to send your issue to our product team as a feature request. They will take it from here and explore...
This model will be supported on MLX VLM soon :) https://github.com/Blaizzy/mlx-vlm/issues/39
PR will be merged to mlx-vlm early tomorrow :) https://github.com/Blaizzy/mlx-vlm/pull/43
Not yet, I'll ping you when ready. I'm still working on it :)
@awni could you give it a try? I would like to see the results. At the moment, I can only run the 2-bit version due to space.
> Looking forward to testing the long sequence length performance! Me too :)
> 2-bit almost never works well, I don't recommend even trying it.. Now I know 😅 The issue is that I currently have a base M1 with 16GB of RAM....
> Yes, it's downloading. I will let you know how it goes. Thanks, can't wait!
> A 4-bit version produces the following. It looks OK, not obviously wrong but not that good either. `python -m mlx_lm.generate --model mlx_model --prompt "Hello, how are you?"` > >...
> * Fixed rope to traditional > * Fixed an issue with layer norm upcasting to fp32 > * Rebased on main + ran formatting Thank you very much @awni...