Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

Distributed computing not utilizing GPUs

Oh interesting. So it only hangs if you run over the network? It might be good to put an eval after you reduce the grads so we can see if...

Distributed computing not utilizing GPUs

Have you tried something really simple just to debug the connection? Something like the following: ``` import mlx.core as mx world = mx.distributed.init() x = mx.distributed.all_sum(mx.ones(10)) print(world.rank(), x) ```

Distributed computing not utilizing GPUs

Great!

Distributed computing not utilizing GPUs

So it works fine for me even over a network. It is just quite slow. Could you try the same script but with smaller sizes (like decrease the input and...

Adding full finetuning

Thanks!! Will review shortly!

Adding full finetuning

You can't fine-tune the quantized layers. You can use a fp16, bf16, or fp32 model for full fine-tuning. The half precision types need care to avoid numerical issues, so ymmv....

Adding full finetuning

@Jonathan-Dobson here is a fix for that https://github.com/ml-explore/mlx-examples/pull/932. Will put it in a new pypi release once it lands.

Adding full finetuning

> Hey @awni, I want to ask if I need to do or change something for it to be merged? Apologies for the delay. Let me take a look this...

Handle longer prompt/generation

Back to draft for a few. The rotating buffer doesn't play well with the step prefill for long prompts.. so that needs some work.

Handle longer prompt/generation

Ok so I think this can be reviewed and merged. A little note on the "infinite KV cache": For simplicity it separates the cache growth into two stages: prefill (i.e....