[WIP] Comms
Beginning of communication namespace (perhaps it should be named comms instead of dist). This is mostly to get feedback while implementing the rest of the primitives and figuring out how to package this in the distribution.
Interesting bits:
mlx::core::distdefines a bunch of functions that are optionally implemented by a communication backend. Currently mpi.- This defines a
Stream communication_stream()and all communication operations go in that CPU stream. - Primitives have transformations defined as expected which means we can write model parallel code with minimal fuss. Whenever sth needs to be communicated just communicate and gradients will flow accordingly. (I have to fix the gradient for all reduce sum but when everything is done it should be easy to use).
I might prefer the name comm or comms over dist.
I also think distributed is fine.. and perhaps better since it is what every other package uses. One can always make a short name ..
This is very nice and simple! Looks great!
Is there any example? How to use it to train/finetune or inference? Thanks!
Maan that was a very nice suggestion. It feels so much better now with no_cpu.cpp removed and the copy moved to the distributed implementation.
This is huge, wish someone could write a tutorial of how to connect 2 Macs use MLX
Usage docs coming soon!
I can't wait to try this out!!
Usage docs coming soon!
Awesome work, so excited for this! Any idea how much throughput will be necessary for various use cases? Also, can MPI aggregate Thunderbolt links?
Usage docs coming soon!
Any updates on docs for this? Can't wait to give it a go!