Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

Creating a module version of lora.py (for referencing the functions in other scripts)

Thanks! Also you should rebase on my updates in #219. I will merge them today!

Creating a module version of lora.py (for referencing the functions in other scripts)

@ProjectProgramAMark thanks for the contribution and leading the charge on packaging up Lora. Since we decided to merge Lora and MLX-lm I think a good goal for this lora example...

Example reading directly from gguf file

@jbochi I pushed a substantial change here. I moved the example to be almost the same as `hf_llm` for the sake of consistency and keeping the option open for future...

Example reading directly from gguf file

@jbochi I don't think we need to wait until https://github.com/ml-explore/mlx/pull/426. I will double check this and we can merge it today!

Example reading directly from gguf file

The relevant PR is #222 from @jbochi . So far I've tested it with a TinyLlama and Mistral model from TheBloke and it worked ago, but indeed I do not...

Example reading directly from gguf file

@jbochi this is working now for Mistral and TinyLlama with native quantization. Let's merge it after we merge https://github.com/ml-explore/mlx/pull/426

Huge memory usage (even in 4bit)

Just looking at raw RAM used is not a great indicator as our allocator hogs memory in a cache even if it's not actively needed (yes this can be an...

Example reading directly from gguf file

Thank YOU for making it happen!

About the consistency of using `tokenizer.model` instead of `AutoTokenizer` with `use_fast=False`

I recommend using the `hf_llm` example it uses `AutoTokenizer` and should more cleanly manage tokenization in general. We are moving other examples towards using `AutoTokenizer` as well.

Checking for nan values in arrays

I'm closing this in favor of the issue I just opened in mlx core: https://github.com/ml-explore/mlx/issues/404