FastMaskRCNN icon indicating copy to clipboard operation
FastMaskRCNN copied to clipboard

How to load resnet-101? (poor result based on resnet-50)

Open onlytailei opened this issue 7 years ago • 2 comments

I trained the model with resnet-50 for 200k iterations. But the result is very poor. I wonder if we should use the resnet-101 as the original Mask-RCNN paper?

onlytailei avatar Oct 12 '17 13:10 onlytailei

I think 200k is too small iteration to see any good result. I assume you tried with batch size 1. In the mask rcnn paper, it trained over 160k with effective batch size 16. I think, we should train 16*160k = 2560k at least with batch size 1. Could you share how loss is decreased over time or how accuracy is increased over the 200k iteration?

insikk avatar Oct 13 '17 05:10 insikk

Hi, It looks like you were able to successfully completed training. I am trying to start training, but i am getting error Caused by op u'pyramid_1/AssignGTBoxes/Where_3', defined at: File "train/train.py", line 339, in train() File "train/train.py", line 193, in train loss_weights=[0.2, 0.2, 1.0, 0.2, 1.0]) File "train/../libs/nets/pyramid_network.py", line 580, in build is_training=is_training, gt_boxes=gt_boxes) File "train/../libs/nets/pyramid_network.py", line 263, in build_heads assign_boxes(rois, [rois, batch_inds], [2, 3, 4, 5]) File "train/../libs/layers/wrapper.py", line 172, in assign_boxes inds = tf.where(tf.equal(assigned_layers, l)) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/array_ops.py", line 2365, in where return gen_array_ops.where(input=condition, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 4053, in where result = _op_def_lib.apply_op("Where", input=input, name=name) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op original_op=self._default_original_op, op_def=op_def) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InternalError (see above for traceback): WhereOp: Could not launch cub::DeviceReduce::Sum to count number of true indices. temp_storage_bytes: 1, status: invalid device function [[Node: pyramid_1/AssignGTBoxes/Where_3 = Where_device="/job:localhost/replica:0/task:0/gpu:0"]] [[Node: pyramid_1/fully_connected_3/BiasAdd/_2753 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_27365_pyramid_1/fully_connected_3/BiasAdd", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]]

satya2550 avatar Feb 08 '18 07:02 satya2550