FastMaskRCNN
FastMaskRCNN copied to clipboard
Mask looks very wrong
I trained > 100k, the loss looks fine, the bounding box looks too much, but the mask looks very wrong. Is it the visualization bug or we need more training?






Masks have to be in the form of 0s and 1s, something is problematic with the mask code. I also obtained very similar results.
Can you share the evaluation code? how to visualize the masks and bounding boxes.
It seem like pred_mask has some random values.
@zhuwenzhen Do you still have weird mask after more training iterations?
Can someone @zhuwenzhen @souryuu @realwecan manage to correct above predicted mask? Thanks
I did not use tensorboard to visualize the predicted mask. However it seems that you forget to pass the mask value to the final sigmoid fuction before visualizing it.
I also see this. Looking at the code, I see:
From https://github.com/CharlesShang/FastMaskRCNN/blob/a70dcdbb16b75f145cba4a5eda92400935ab863b/libs/nets/pyramid_network.py#L344
## mask head
m = cropped_rois
for _ in range(4):
m = slim.conv2d(m, 256, [3, 3], stride=1, padding='SAME', activation_fn=tf.nn.relu)
# to 28 x 28
m = slim.conv2d_transpose(m, 256, 2, stride=2, padding='VALID', activation_fn=tf.nn.relu)
tf.add_to_collection('__TRANSPOSED__', m)
m = slim.conv2d(m, num_classes, [1, 1], stride=1, padding='VALID', activation_fn=None)
# add a mask, given the predicted boxes and classes
outputs['mask'] = {'mask':m, 'cls': classes, 'score': scores}
From https://github.com/CharlesShang/FastMaskRCNN/blob/a70dcdbb16b75f145cba4a5eda92400935ab863b/libs/nets/pyramid_network.py#L606
# final network visualization
first_mask = outputs['mask']['mask'][:1]
first_mask = tf.transpose(first_mask, [3, 1, 2, 0])
visualize_final_predictions(outputs['final_boxes']["box"], end_points["input"], first_mask)
From https://github.com/CharlesShang/FastMaskRCNN/blob/a70dcdbb16b75f145cba4a5eda92400935ab863b/libs/visualization/summary_utils.py#L37
def visualize_final_predictions(boxes, image, masks):
visualize_masks(masks, "pred_mask")
visualize_bb(image, boxes, "final_bb_pred")
Seems like there isn't a sigmoid. I'm not exactly sure how to add it though.
Did anyone found a solution to fix the above on mask accuracy? thanks Tets
I thought tf.sigmoid(outputs['mask']['mask']) should do a fine work. However, it did not work when I tried to visualize masks on tensorboard. So I wrote another visualization function using PIL to create .png. It works fine for drawing predicted masks on images during training. If you are interested, you can look at my visualization fork. However, I changed a lot of things in that fork and the code is totally mess.
Thanks @souryuu , I try your code and it works fine. BTW, you really change a lot :-)
@lengly Just wondering if you were able to reproduce any experimental evaluation results from the paper (or obtaining a close performance)? Thanks!
@realwecan You can try souryuu's code, it'll save some images which make sense. I only try 60k iter. So the result isn't as good as the paper.
@souryuu Your fork is truly beneficial thank you very much for your efforts. However I didn't understand why test.py and train.py differ that much by just looking at the code, could you please explain it? Images created by test.py predicts only human labels even they are not humans sometimes. On the contrary, images created by train.py involve various types of labels but with intensive number of bounding box compared to test.py.
@Cangrc Hi, does your test.py work without any modification? I run train.py for 50k iter. and when I run test.py, there is no mask or bbox. Also, please see here, souryuu said during the first 50k iter, the result biased to person class. But after 600k iter, it works fine.
Yes it works without any modification. I think I am somewhere near 200k iteration.
If you use the code in my fork without any modification, your batch normalization should not be properly trained up until now because update_bn in config_v1.py was set to False. Change that to True and train the network again. During test, set all is_training flag in test.py to False. You should see significant improvement.
@souryuu Thanks heaps! Just wanted to clarify here, in your latest code you have added the IS_TRAINING flag. I can see that when IS_TRAINING==True, update_bn will be set to True (and False otherwise). Am I correct to assume that I can just use this version of code as is, as long as I set IS_TRAINING=True when training, and IS_TRAINING=False during test?
@realwecan Yes it should be something like that. However, in the latest version on "fix_testing" branch, all "is_training" in train.py and test.py are already set to True and False respectively. BN should work fine without any setting.
@souryuu Thanks for your prompt reply! Just wondering if you have ever encountered a problem where you see the regular_loss constantly rising. Could this caused by improper settings of the learning rate and the loss weights? I have been experiencing this in a few versions of the code.
@realwecan Yes I encountered that in the early stage of training. Afterward, it gradually reduced. Setting learning rate and loss weight too high should temporary increase the regular loss. However, I am not sure if it matters. From what I understand,. It is just a loss from a regularization term that helps you not overfitting to the training data.
@souryuu When testing with the latest code in the fix_testing branch I met the following error:
Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call ret = func(*args) File "/home/twang/work/segmentation/baselines/FastMaskRCNN/train/../libs/layers/sample.py", line 127, in sample_rpn_outputs_wrt_gt_boxes gt_argmax_overlaps = overlaps.argmax(axis=0) # G ValueError: attempt to get argmax of an empty sequence
This error does not seem to exist in the code in your master branch. Do you have any idea what this might be coming from? Thanks!
@souryuu I try your latest code in master branch. I set is_training=True and train it for 60k iterations. When I set is_training=False, and run test.py, there're still no bbox & mask. Why? Is it because my iteration is not enough?
@realwecan in pyramid_networks line 272, setting only_positive flag in sample_rpn_outputs to False should solve your issue. It is the flag to filter rois with RPN's confidence value < 0.5 for speed up the training. Since at the beginning of training RPN can not provide reliable confidence value, this should be turned off.
@lengy Can you see masks and bounding boxs in images created during training ?
@souryuu Yes, when training, it works fine. the box & mask in training image are just like Cangrc's result above.
@lengly Check if update_bn in config_v1.py is set to True during training. If not, set it to True and train the network again.
@souryuu Yeah, update_bn is true when training. I use your latest code in master branch, and the only change is per_process_gpu_memory_fraction=0.95, but I think it should not affect result. Did the latest master branch test.py work fine in your side?
It works fine for me. During training, did "target" and "predicted" match ? If no, I think your RCNN did not learn well. you can keep RPN and begin training RCNN and mask network by commenting out line 76 and uncomment line 80-125 in train.py. If yes, I recommend you to use the code in fix_testing branch instead. However, you need to retrain the network again since the anchors in both branches do not match.
@souryuu Yeah they're nearly match. I'll try fix_testing branch. I really appreciate for your help!
@Cangrc @realwecan @lengly It would be greatly appreciated if some of you could maybe share the trained weights. It would take a long time for my computer to reach 600k iterations. Thanks!
Can you share the trained weights?Thanks very much!! @souryuu @gengyixuan @lengly @realwecan
@souryuu I set is_training=True and train it for 300k iterations. When I set is_training=False, and run test.py, the result looks bad, target and predict do not match. What should I do? commenting out line 76 and uncomment line 80-125 in train.py? And then training the code again?
I trained until 400K but the prediction looks very bad. Anyone succeed? Btw, t=it seems we have a new extended version of Mask RCNN on Caffe AffordanceNet. The result looks interesting to me.
@souryuu Thanks for your code! I tried your code following the 'README.md' but error happened. It said:"IOError: [Errno 2] No such file or directory: './output/mask_rcnn/est_imgs/train_est_0.jpg' " From train.py I could not see any code about how to generate file 'est_imgs'. So can you tell me if i missed some steps about it? I will appreciate it.