MXNet.jl icon indicating copy to clipboard operation
MXNet.jl copied to clipboard

Distributed MXNet with ClusterManager

Open vchuravy opened this issue 10 years ago • 3 comments

I was thinking that since we now have the ability to run MXNet distributed it would be nice to integrate ClusterManager.jl and simplify the process for the Julia side.

vchuravy avatar Nov 11 '15 02:11 vchuravy

Could be possible, as what MXNet needed was quite minimum. Currently, it relies on https://github.com/dmlc/dmlc-core/tree/master/tracker to start a tracker(master process) then start slave setting env variable of master(including IP, ID etc)

tqchen avatar Nov 11 '15 02:11 tqchen

Exactly, I will be taking a look into this later this week, if nobody beats me to it.

vchuravy avatar Nov 11 '15 02:11 vchuravy

Is there any progress on this, or an example of how to run a distributed Julia MXNet Job? The examples from dmlc seem to require a recompile and using the python interface...

mbrookhart avatar Nov 10 '16 19:11 mbrookhart