Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

feat: add update_config functionality

Hi @sugatoray sorry I just saw your question: > Are you at all in favor of breaking the function into two? The update config seems less useful for me since...

feat: add update_config functionality

@sugatoray I'm still waiting for you to let me know when to check this again right?

Instruct tuning for lora/finetune?

We have a more featured version of lora in mlx-lm https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/LORA.md Assuming it doesn’t add much code complexity I think it would be cool to update it to support alternative...

Some improvements to LoRA

I will wait for this to land and then adopt it here https://github.com/ml-explore/mlx/pull/788

Some improvements to LoRA

@madroidmaq I think that is accounted for by a race condition that we recently fixed. It should already be fixed on main in MLX.

When to stop in the LLMEval?

It should stop at the end of sentence id: https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Evaluate.swift#L199 The fact that it's not stopping likely means it doesn't have the right EOS token ID set. Which model did...

When to stop in the LLMEval?

Looks like this is the eos token for that model: https://huggingface.co/mlx-community/Phi-3-mini-4k-instruct-4bit-no-q-embed/blob/main/tokenizer_config.json#L340. We'll need to check to make sure the IDs match / the tokenizer is reading it correctly.

LLMEval not loading Qwen1.5 -0.5B model in to memory

Right, we changed quantization in MLX core so now the embedding layer is quantized. We'll need to update Swift to do the same.

LLMEval not loading Qwen1.5 -0.5B model in to memory

Those are the commits. Sorry that broke more stuff than I was expecting. Basically the embeddings are default quantized now. So when we quantize for MLX in python the model...

LLMEval not loading Qwen1.5 -0.5B model in to memory

Right, so this is what solves that problem in MLX: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/utils.py#L336-L346 It's actually really useful because it handles heterogeneously quantized models very cleanly which is a problem we've had in...