maskrcnn.mxnet
maskrcnn.mxnet copied to clipboard
multi gpus
I try to use --gpus=0,1,2,3 to train ,but get error:
raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: Error in operator rois: Shape inconsistent, Provided=(1,3), inferred shape=(4,3)
How to fix the params?
In the file "maskrcnn_train_end2end.py".
Change the max shape code to
max_data_shape = [('data', (1, 3, max([v[0] for v in config.SCALES]), max([v[1] for v in config.SCALES])))]
max_data_shape, max_label_shape = train_data.infer_shape(max_data_shape)
max_data_shape.append(('gt_boxes', (1, 100, 5)))
max_data_shape.append(('gt_masks', (1, 100, max([v[0] for v in config.SCALES]), max(v[1] for v in config.SCALES))))
max_data_shape.append(('im_info', (1,train_data.provide_data_single[1][1][1])))
logger.info('providing maximum shape %s %s' % (max_data_shape, max_label_shape))
@solin319 new problem:
File "maskrcnn_train_end2end.py", line 203, in <module>
main()
File "maskrcnn_train_end2end.py", line 200, in main
lr=args.lr, lr_step=args.lr_step)
File "maskrcnn_train_end2end.py", line 162, in train_net
arg_params=arg_params, aux_params=aux_params, begin_epoch=begin_epoch, num_epoch=end_epoch)
File "/home/pp/code/maskrcnn.mxnet/rcnn/core/module.py", line 955, in fit
self.update()
File "/home/pp/code/maskrcnn.mxnet/rcnn/core/module.py", line 1037, in update
self._curr_module.update()
File "/home/pp/code/maskrcnn.mxnet/rcnn/core/module.py", line 573, in update
self._kvstore)
TypeError: _update_params_on_kvstore() takes exactly 4 arguments (3 given)
The interface of _update_params_on_kvstore was changed in MXNet-0.11 version. You can add a argument 'self._exec_group.param_names' when called.
def update(self):
"""Updates parameters according to the installed optimizer and the gradients computed
in the previous forward-backward batch.
See Also
----------
:meth:`BaseModule.update`.
"""
assert self.binded and self.params_initialized and self.optimizer_initialized
self._params_dirty = True
if self._update_on_kvstore:
_update_params_on_kvstore(self._exec_group.param_arrays,
self._exec_group.grad_arrays,
self._kvstore, self._exec_group.param_names)
else:
_update_params(self._exec_group.param_arrays,
self._exec_group.grad_arrays,
updater=self._updater,
num_device=len(self._context),
kvstore=self._kvstore,
param_names=self._exec_group.param_names)
@solin319 Thanks!
I got another error:
File “/home/pp/training_scripts/maskrcnn.mxnet/rcnn/io/rpn.py”, line 149, in assign_anchor
gt_argmax_overlaps = overlaps.argmax(axis=0)
ValueError: attempt to get argmax of an empty sequence
It sames like that the unusual size anchors locate outside the image.
I was failed to remove the unusual images because of leaking the the information of instances_train2014.json.
Do you have some ideas about this? Can we modified the training code to avoid the problem?
I meet the same problem and have no idea at this time.
@solin319 In the file rcnn/io/rpn.py
# only keep anchors inside the image
inds_inside = np.where((all_anchors[:, 0] >= -allowed_border) &
(all_anchors[:, 1] >= -allowed_border) &
(all_anchors[:, 2] < im_info[1] + allowed_border) &
(all_anchors[:, 3] < im_info[0] + allowed_border))[0]
The default value of allowed_border is zero.
I add a parameter to AnchorLoader function in maskrcnn_train_end2end.py
train_data = AnchorLoader(feat_sym, sdsdb, batch_size=input_batch_size, shuffle=not args.no_shuffle,
ctx=ctx, work_load_list=args.work_load_list,
feat_stride=config.RPN_FEAT_STRIDE, anchor_scales=config.ANCHOR_SCALES,
anchor_ratios=config.ANCHOR_RATIOS,
aspect_grouping=config.TRAIN.ASPECT_GROUPING,allowed_border=50)
It works well so far.