mlx icon indicating copy to clipboard operation
mlx copied to clipboard

[WIP] Comms

Open angeloskath opened this issue 1 year ago • 3 comments

Beginning of communication namespace (perhaps it should be named comms instead of dist). This is mostly to get feedback while implementing the rest of the primitives and figuring out how to package this in the distribution.

Interesting bits:

  • mlx::core::dist defines a bunch of functions that are optionally implemented by a communication backend. Currently mpi.
  • This defines a Stream communication_stream() and all communication operations go in that CPU stream.
  • Primitives have transformations defined as expected which means we can write model parallel code with minimal fuss. Whenever sth needs to be communicated just communicate and gradients will flow accordingly. (I have to fix the gradient for all reduce sum but when everything is done it should be easy to use).

angeloskath avatar May 09 '24 21:05 angeloskath

I might prefer the name comm or comms over dist.

I also think distributed is fine.. and perhaps better since it is what every other package uses. One can always make a short name ..

awni avatar May 10 '24 13:05 awni

This is very nice and simple! Looks great!

awni avatar May 10 '24 14:05 awni

Is there any example? How to use it to train/finetune or inference? Thanks!

fishelegs avatar May 14 '24 12:05 fishelegs

Maan that was a very nice suggestion. It feels so much better now with no_cpu.cpp removed and the copy moved to the distributed implementation.

angeloskath avatar May 23 '24 19:05 angeloskath

This is huge, wish someone could write a tutorial of how to connect 2 Macs use MLX

lin72h avatar May 25 '24 04:05 lin72h

Usage docs coming soon!

awni avatar May 25 '24 04:05 awni

I can't wait to try this out!!

sck-at-ucy avatar May 25 '24 05:05 sck-at-ucy

Usage docs coming soon!

Awesome work, so excited for this! Any idea how much throughput will be necessary for various use cases? Also, can MPI aggregate Thunderbolt links?

altaic avatar May 25 '24 05:05 altaic

Usage docs coming soon!

Any updates on docs for this? Can't wait to give it a go!

jami3f avatar Nov 11 '24 10:11 jami3f

awni avatar Nov 11 '24 14:11 awni