Multi-GPU training
Hi Seanlinx,
Thank you for your wonderful work. I got an error while trying to enable gpu training at
//===== Traceback (most recent call last): File "gen_hard_example.py", line 229, in args.slide_window, args.shuffle, args.vis) File "gen_hard_example.py", line 167, in test_net detections = mtcnn_detector.detect_face(imdb, test_data, vis=vis) File "/home/dang/test/mtcnn-train/core/MtcnnDetector.py", line 456, in detect_face boxes, boxes_c = self.detect_pnet(im) File "/home/dang/test/mtcnn-train/core/MtcnnDetector.py", line 278, in detect_pnet cls_map, reg = self.pnet_detector.predict(im_resized) File "/home/dang/test/mtcnn-train/core/fcn_detector.py", line 28, in predict grad_req='null', aux_states=self.aux_params) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.2-py2.7-linux-x86_64.egg/mxnet/symbol.py", line 926, in bind ctypes.byref(handle))) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.9.2-py2.7-linux-x86_64.egg/mxnet/base.py", line 75, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [10:57:05] src/executor/graph_executor.cc:240: Check failed: x.ctx() == default_ctx Input array is in cpu(0) while binding with ctx=gpu(0). All arguments must be in global context (gpu(0)) unless group2ctx is specified for cross-device graph. //====
I guess the issue comes from fcn_detector.py self.executor = self.symbol.bind(self.ctx, self.arg_params, args_grad=None, grad_req='null', aux_states=self.aux_params)
and arg_params are all cpu(0): //==== {'conv4_1_bias': <NDArray 2 @cpu(0)>, 'conv4_2_bias': <NDArray 4 @cpu(0)>, 'prelu1_gamma': <NDArray 10 @cpu(0)>, 'conv1_bias': <NDArray 10 @cpu(0)>, 'conv3_weight': <NDArray 32x16x3x3 @cpu(0)>, 'conv2_bias': <NDArray 16 @cpu(0)>, 'conv2_weight': <NDArray 16x10x3x3 @cpu(0)>, 'conv1_weight': <NDArray 10x3x3x3 @cpu(0)>, 'conv4_2_weight': <NDArray 4x32x1x1 @cpu(0)>, 'conv4_1_weight': <NDArray 2x32x1x1 @cpu(0)>, 'data': <NDArray 1x3x692x512 @gpu(0)>, 'conv3_bias': <NDArray 32 @cpu(0)>, 'prelu2_gamma': <NDArray 16 @cpu(0)>, 'prelu3_gamma': <NDArray 32 @cpu(0)>} //====
However, I checked that the self.ctx is always gpu(0) throughout the code. Do you have any idea of how to convert data to gpu context instead of cpu? Thanks.
Modify line 131 in prepare_data/gen_hard_example.py:
args, auxs = load_param(prefix[0], epoch[0], convert=True, ctx=ctx)
There might be some other problems since I have only tested the code on mxnet 0.7.0