Awni Hannun
Awni Hannun
There is a maximum allowed buffer size for a given machine and you can't have arrays which require more than that. You can check the `max_buffer_length` in `mx.metal.device_info()`. Presumably you...
Recent update to MLX LM can handle much longer prompts now. You can also play with the parameter `--max-kv-size` can trade off memory use / speed for accuracy. I would...
Closing for now, let us know if you still have issues with long prompts..
I'm not certain this is the problem so it would be good to validate it. But fusing can cause precision issues. In low precision: `c = a + b` can...
It looks fine to me. > but just adding `--use_dora` outputs rubbish What kind of rubbish? Does the loss go down or not really?
> I just wanted to know if there is something to be considered while using dora or just --use-dora does the work It should work to do `--use-dora`, as in...
> Do you recommend any tool to look at the weights and adapters as the files are binaries? You can load them using `mx.load`. That will give you a dictionary...
Sorry for the delay here. I'm not able to reproduce the warnings you saw. For example, the following runs without warnings: ``` mlx_lm.lora --model HuggingFaceTB/SmolLM-135M --data ../lora/data --iters 100 --train...
> Is that possible to do distributed inference as well? Possible yes, but getting a nice speedup is more challenging. That's something we're looking at, but don't have an ETA...
I think its a good idea, especially if we are going to use type annotations (which we do).