cxxnet icon indicating copy to clipboard operation
cxxnet copied to clipboard

error when running multi-machine example

Open weihaoxie opened this issue 9 years ago • 2 comments

I run multi-machine example and occur error . I don't know how to deal with it . Can any one help me?

  • ../../dmlc-core/tracker/dmlc_mpi.py -H hosts -n 1 -s 1 ../../bin/cxxnet.ps bowl.conf update_on_server=1 param_server=dist 2015-07-27 11:04:10,909 INFO start listen on 127.0.1.1:9091 mpirun: Error: unknown option "-env" Type 'mpirun --help' for usage. Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(_self.__args, *_self.__kwargs) File "../../dmlc-core/tracker/dmlc_mpi.py", line 63, in run subprocess.check_call(cmd, shell = True, env = env) File "/usr/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command 'mpirun -n 2 --hostfile hosts -env DMLC_NUM_SERVER 1 -env DMLC_NUM_WORKER 1 -env DMLC_PS_ROOT_PORT 9092 -env DMLC_PS_ROOT_URI 127.0.1.1 -env DMLC_TRACKER_URI 127.0.1.1 -env DMLC_TRACKER_PORT 9091 ../../bin/cxxnet.ps bowl.conf update_on_server=1 param_server=dist ' returned non-zero exit status 213

F0727 11:04:21.168474 12551 manager.cc:55] Timeout (10 sec) to wait all other nodes initialized. See commmets for more information *** Check failure stack trace: *** @ 0x64343a google::LogMessage::Fail() @ 0x64523c google::LogMessage::SendToLog() @ 0x643078 google::LogMessage::Flush() @ 0x645b6e google::LogMessageFatal::~LogMessageFatal() @ 0x5ab37e ps::Manager::Run() @ 0x5b1d6e ps::Postoffice::Run() @ 0x40f475 main @ 0x7f8327388ec5 (unknown) @ 0x4115bf (unknown) Aborted (core dumped) Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 763, in run self.__target(_self.__args, *_self.__kwargs) File "/home/meitu/weihao/cxxnet/dmlc-core/tracker/tracker.py", line 345, in self.thread = Thread(target = (lambda : subprocess.check_call(self.cmd, env=env, shell=True)), args = ()) File "/usr/lib/python2.7/subprocess.py", line 540, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '../../bin/cxxnet.ps bowl.conf update_on_server=1 param_server=dist ' returned non-zero exit status 134

weihaoxie avatar Jul 27 '15 04:07 weihaoxie

me too!!!

TangXing avatar Aug 13 '15 06:08 TangXing

Because you are using openmpi. Just edit the file ../../dmlc-core/tracker/dmlc_mpi.py,

for mpich2

cmd += ' -env %s %s' % (k, v)

for openmpi

cmd += ' -x %s' % k

zyzhong avatar Oct 21 '17 09:10 zyzhong