mlx-examples
mlx-examples copied to clipboard
Distributed Processing in any way?
Hello, as you might know, I'm admiring your works (all of you guys, all the contributors) and love our community.
Apart from this start, here is my simple question:
Is there any plan to make it distributed or can I use any already written library/frameworks that I can use for this purpose?
Thank you, Good luck!
Could you say more about what you are looking for? Distributed is a pretty generic term.
What exactly would you like to distribute? Training / inference? At what granularity? Any specific example?
Let's assume I have a couple of high-end apple device on my home. I wanna use them together to generate more processing power. Both for training and inference.
We could be referring to solutions like Spark or Dask (with local, Kubernetes, MPI, and other backends) for distributed data processing (that could eventually implement ML specifics, such as Dask-ML) or something very specific for ML from starting point like PyTorch and TensorFlow distributed training features.
Implementation examples
Dask
Set up
The setup depends on backend. In this example, we are defining a local cluster.
Start manager (scheduler):
dask scheduler
Start workers:
dask worker <dask-scheduler-address>
The cluster can also be created and managed using Python:
from dask.distributed import LocalCluster
# The cluster object allows to scale the number of workers remotely
cluster = LocalCluster(n_workers=1)
cluster.scale(2)
The workers need to discover the manager in the network and share access to resources such as files and data sources.
Usage
Standalone:
import dask.dataframe as dd
df = dd.read_csv(...)
df.x.sum().compute()
Local cluster:
from dask.distributed import Client
import dask.dataframe as dd
client = Client('<dask-scheduler-address>')
# Use the default client object from runtime
df = dd.read_csv(...)
df.x.sum().compute()
ML usage
Local cluster:
from dask.distributed import Client
import dask.dataframe as dd
from dask_ml.cluster import KMeans
client = Client('<dask-scheduler-address>')
# Use the default client object from runtime
df = dd.read_csv(...)
kmeans = KMeans(n_clusters=3, init_max_iter=2, oversampling_factor=10)
kmeans.fit(df.to_dask_array(lengths=True))
kmeans.predict(df.head()).compute()
We could implement a MLX collection backend (mlx-dask
) for Dask:
https://docs.dask.org/en/latest/how-to/selecting-the-collection-backend.html#defining-a-new-collection-backend
Man, you really enlighten the path. Yes, we can. Give me a couple of days to read the documentation and implementation details.
I have a question in advance, how far we have to dive in order to implement such a backend? Can we find best practises or can we get ideas of from other backends (I assume they have other backends).
Thank you, trully, Sincerely.
@LeaveNhA, I encountered challenges while working on a prototype with the Dask Backend Entrypoint API:
- The MLX data type is not an alias of
np.dtype
as expected by Dask. - There could be additional compatibility issues with the MLX Random module.
- It's important to note that the Dask Backend Entrypoint API is still in an experimental phase.
We could explore alternative methods such as Dask Custom Collection, Dask Delayed and Dask Futures to implement distributed computations.
Ray is also an interesting option to explore.
I think, while we are busy, this problem solved, right?
It seems the MLX team added MPI distributed training support!
We did indeed: https://ml-explore.github.io/mlx/build/html/usage/distributed.html
I think we can close this and open more targeted issues related to distributed models as they come up.