Awni Hannun
Awni Hannun
@Blucknote There are pre-converted quantized models in the MLX Hugging Face community: https://huggingface.co/mlx-community Also, all of the conversion scripts in the [LLM examples](https://github.com/ml-explore/mlx-examples/tree/main/llms) can produce quantized models
That is not expected, it sounds like a bug. Thanks for reporting, I will take a look.
Yes we are very much aware of this issue. Working with @angeloskath on a fix.
For now I recommend avoiding the running stats until we fix it
This should fix it https://github.com/ml-explore/mlx/pull/385
Thanks @singhaki for the input. Regarding the rope_traditional flag, I wouldn't remove it as the model won't work as well. We can add the config param to our model so...
This will be updated in #252
Sorry about that, the fix is in #227
It’s probably using too much memory. Read the [section in the readme](https://github.com/ml-explore/mlx-examples/tree/main/lora#memory-issues) on how to reduce memory use. If it’s still super slow your machine may not have enough memory...
Wow that's odd. Did you happen to do anything on your computer around the time that it slowed down? It's possible the GPU got used by something else? Also you...