Awni Hannun comments

Results 1014 comments of


                                            Awni Hannun

Handle longer prompt/generation

You're understanding is exactly right. The logical order doesn't matter in this case, the output is the same since self-attention is invariant to permutations in its input. (Note the RoPE...

Adding support for mamba

Awesome! Does it work yet?

Distributed inference example

Which OS are you on? A couple things that might help: 1. Restart the machine(s) 2. Upgrade to Sonoma (OS 15.0) 3. Set some sysctls: ``` sudo sysctl iogpu.wired_limit_mb=200000 sudo...

Distributed inference example

> one M2 Ultra 192GB with another M2 Ultra 128GB, splitting the weights to 160GB and 67GB Maybe putting more on the 128GB machine will help also. Like 140 and...

Distributed inference example

Nice!! Did you keep the sharding you had or rebalance it? I wonder if we could make it faster with a more even balance 🤔 . But 3.4 t/s is...

Distributed inference example

Closing as this branch is available in mlx-lm: https://github.com/ml-explore/mlx-lm/tree/distributed-layers

[Feature Request] MLX_lm: Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs

I think we can close this! 🚀 [Documentation on usage](https://github.com/ml-explore/mlx-examples/tree/main/llms#long-prompts-and-generations).

Feature: QDoRA

Sorry for the delay! Will plan to review this and share feedback shortly.

Feature: QDoRA

This is a nice addition! Apologies for the delayed review. Could you rebase and address the comments and then we can merge it? Thank you!

[Model Request] Support DCLM-7B

I don't think it's working yet.. we need to debug that. Can leave this issue open for now if someone has time to pick it up. Relevant branch: https://github.com/ml-explore/mlx-examples/tree/openlm