Awni Hannun
Awni Hannun
> we have just been using the load safetensors function and the update parameters method. But how do you know if it's a quantized model or not? Presumably there are...
This is what I'm referring to: https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Load.swift#L58-L60 MLX LM has always had something like that. It builds the quantized model based on the config. The premise didn't change much. Only...
It looks like you added some edge case handling already in there (e.g. https://github.com/ml-explore/mlx-swift-examples/blob/main/Libraries/LLM/Load.swift#L97-L108). The update to MLX LM simplified that kind of stuff a bit.
> The fluctuation in this part will reduce the accuracy of the final model by about 10%. How did you measure that? Indeed I added the sorting before batching because...
Yikes that’s pretty bad. So couple ideas: 1. we can have a no sort flag 2. Could try to make batch content more random. 3. could revert the sorting for...
Thanks!! I will take a look at #548 shortly, sorry for the delay!
@madroidmaq did you have any time to investigate this? I am hoping to come back to it and figure out a better batching strategy.
Thanks for the update.. it might be best to simply disable sorting and compile in LoRA for now :(. It is a modest but nice speed improvement so it's a...
That doesn't look so good. What version of MLX are you using and what commit for MLX LM?
@madroidmaq make sure you are using the latest MLX (0.8 or building from source). That's pretty important otherwise you will go through a bad path for RMS Norm (it won't...