Awni Hannun
Awni Hannun
Thanks! Also you should rebase on my updates in #219. I will merge them today!
@ProjectProgramAMark thanks for the contribution and leading the charge on packaging up Lora. Since we decided to merge Lora and MLX-lm I think a good goal for this lora example...
@jbochi I pushed a substantial change here. I moved the example to be almost the same as `hf_llm` for the sake of consistency and keeping the option open for future...
@jbochi I don't think we need to wait until https://github.com/ml-explore/mlx/pull/426. I will double check this and we can merge it today!
The relevant PR is #222 from @jbochi . So far I've tested it with a TinyLlama and Mistral model from TheBloke and it worked ago, but indeed I do not...
@jbochi this is working now for Mistral and TinyLlama with native quantization. Let's merge it after we merge https://github.com/ml-explore/mlx/pull/426
Just looking at raw RAM used is not a great indicator as our allocator hogs memory in a cache even if it's not actively needed (yes this can be an...
Thank YOU for making it happen!
I recommend using the `hf_llm` example it uses `AutoTokenizer` and should more cleanly manage tokenization in general. We are moving other examples towards using `AutoTokenizer` as well.
I'm closing this in favor of the issue I just opened in mlx core: https://github.com/ml-explore/mlx/issues/404