Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

LLMEval not loading Qwen1.5 -0.5B model in to memory

> we have just been using the load safetensors function and the update parameters method. But how do you know if it's a quantized model or not? Presumably there are...

LLMEval not loading Qwen1.5 -0.5B model in to memory

This is what I'm referring to: https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Load.swift#L58-L60 MLX LM has always had something like that. It builds the quantized model based on the config. The premise didn't change much. Only...

LLMEval not loading Qwen1.5 -0.5B model in to memory

It looks like you added some edge case handling already in there (e.g. https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Load.swift#L97-L108). The update to MLX LM simplified that kind of stuff a bit.

LoRA: Increased volatility of train loss

> The fluctuation in this part will reduce the accuracy of the final model by about 10%. How did you measure that? Indeed I added the sorting before batching because...

LoRA: Increased volatility of train loss

Yikes that’s pretty bad. So couple ideas: 1. we can have a no sort flag 2. Could try to make batch content more random. 3. could revert the sorting for...

LoRA: Increased volatility of train loss

Thanks!! I will take a look at #548 shortly, sorry for the delay!

LoRA: Increased volatility of train loss

@madroidmaq did you have any time to investigate this? I am hoping to come back to it and figure out a better batching strategy.

LoRA: Increased volatility of train loss

Thanks for the update.. it might be best to simply disable sorting and compile in LoRA for now :(. It is a modest but nice speed improvement so it's a...

LoRA: Increased volatility of train loss

That doesn't look so good. What version of MLX are you using and what commit for MLX LM?

LoRA: Increased volatility of train loss

@madroidmaq make sure you are using the latest MLX (0.8 or building from source). That's pretty important otherwise you will go through a bad path for RMS Norm (it won't...