Distributed MXNet with ClusterManager
I was thinking that since we now have the ability to run MXNet distributed it would be nice to integrate ClusterManager.jl and simplify the process for the Julia side.
Could be possible, as what MXNet needed was quite minimum. Currently, it relies on https://github.com/dmlc/dmlc-core/tree/master/tracker to start a tracker(master process) then start slave setting env variable of master(including IP, ID etc)
Exactly, I will be taking a look into this later this week, if nobody beats me to it.
Is there any progress on this, or an example of how to run a distributed Julia MXNet Job? The examples from dmlc seem to require a recompile and using the python interface...