cxxnet
cxxnet copied to clipboard
error when running multi-machine example
I run multi-machine example and occur error . I don't know how to deal with it . Can any one help me?
- ../../dmlc-core/tracker/dmlc_mpi.py -H hosts -n 1 -s 1 ../../bin/cxxnet.ps bowl.conf update_on_server=1 param_server=dist 2015-07-27 11:04:10,909 INFO start listen on 127.0.1.1:9091 mpirun: Error: unknown option "-env" Type 'mpirun --help' for usage. Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(_self.__args, *_self.__kwargs) File "../../dmlc-core/tracker/dmlc_mpi.py", line 63, in run subprocess.check_call(cmd, shell = True, env = env) File "/usr/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command 'mpirun -n 2 --hostfile hosts -env DMLC_NUM_SERVER 1 -env DMLC_NUM_WORKER 1 -env DMLC_PS_ROOT_PORT 9092 -env DMLC_PS_ROOT_URI 127.0.1.1 -env DMLC_TRACKER_URI 127.0.1.1 -env DMLC_TRACKER_PORT 9091 ../../bin/cxxnet.ps bowl.conf update_on_server=1 param_server=dist ' returned non-zero exit status 213
F0727 11:04:21.168474 12551 manager.cc:55] Timeout (10 sec) to wait all other nodes initialized. See commmets for more information
*** Check failure stack trace: ***
@ 0x64343a google::LogMessage::Fail()
@ 0x64523c google::LogMessage::SendToLog()
@ 0x643078 google::LogMessage::Flush()
@ 0x645b6e google::LogMessageFatal::~LogMessageFatal()
@ 0x5ab37e ps::Manager::Run()
@ 0x5b1d6e ps::Postoffice::Run()
@ 0x40f475 main
@ 0x7f8327388ec5 (unknown)
@ 0x4115bf (unknown)
Aborted (core dumped)
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(_self.__args, *_self.__kwargs)
File "/home/meitu/weihao/cxxnet/dmlc-core/tracker/tracker.py", line 345, in
me too!!!
Because you are using openmpi. Just edit the file ../../dmlc-core/tracker/dmlc_mpi.py,
for mpich2
cmd += ' -env %s %s' % (k, v)
for openmpi
cmd += ' -x %s' % k