Awni Hannun

Results 1014 comments of Awni Hannun

You're understanding is exactly right. The logical order doesn't matter in this case, the output is the same since self-attention is invariant to permutations in its input. (Note the RoPE...

Awesome! Does it work yet?

Which OS are you on? A couple things that might help: 1. Restart the machine(s) 2. Upgrade to Sonoma (OS 15.0) 3. Set some sysctls: ``` sudo sysctl iogpu.wired_limit_mb=200000 sudo...

> one M2 Ultra 192GB with another M2 Ultra 128GB, splitting the weights to 160GB and 67GB Maybe putting more on the 128GB machine will help also. Like 140 and...

Nice!! Did you keep the sharding you had or rebalance it? I wonder if we could make it faster with a more even balance 🤔 . But 3.4 t/s is...

Closing as this branch is available in mlx-lm: https://github.com/ml-explore/mlx-lm/tree/distributed-layers

I think we can close this! 🚀 [Documentation on usage](https://github.com/ml-explore/mlx-examples/tree/main/llms#long-prompts-and-generations).

Sorry for the delay! Will plan to review this and share feedback shortly.

This is a nice addition! Apologies for the delayed review. Could you rebase and address the comments and then we can merge it? Thank you!

I don't think it's working yet.. we need to debug that. Can leave this issue open for now if someone has time to pick it up. Relevant branch: https://github.com/ml-explore/mlx-examples/tree/openlm