FastMaskRCNN icon indicating copy to clipboard operation
FastMaskRCNN copied to clipboard

Mask looks very wrong

Open zhuwenzhen opened this issue 7 years ago • 32 comments

I trained > 100k, the loss looks fine, the bounding box looks too much, but the mask looks very wrong. Is it the visualization bug or we need more training?

screenshot 2017-07-02 15 55 46 screenshot 2017-07-02 15 55 54 screenshot 2017-07-02 15 56 00 screenshot 2017-07-02 15 57 39 screenshot 2017-07-02 15 57 47 screenshot 2017-07-02 15 57 54

zhuwenzhen avatar Jul 02 '17 20:07 zhuwenzhen

Masks have to be in the form of 0s and 1s, something is problematic with the mask code. I also obtained very similar results.

Cangrc avatar Jul 02 '17 22:07 Cangrc

Can you share the evaluation code? how to visualize the masks and bounding boxes.

HuangBo-Terraloupe avatar Jul 04 '17 16:07 HuangBo-Terraloupe

It seem like pred_mask has some random values.

@zhuwenzhen Do you still have weird mask after more training iterations?

insikk avatar Jul 11 '17 03:07 insikk

Can someone @zhuwenzhen @souryuu @realwecan manage to correct above predicted mask? Thanks

Cangrc avatar Jul 19 '17 20:07 Cangrc

I did not use tensorboard to visualize the predicted mask. However it seems that you forget to pass the mask value to the final sigmoid fuction before visualizing it.

souryuu avatar Jul 20 '17 00:07 souryuu

I also see this. Looking at the code, I see:

From https://github.com/CharlesShang/FastMaskRCNN/blob/a70dcdbb16b75f145cba4a5eda92400935ab863b/libs/nets/pyramid_network.py#L344

        ## mask head
        m = cropped_rois
        for _ in range(4):
            m = slim.conv2d(m, 256, [3, 3], stride=1, padding='SAME', activation_fn=tf.nn.relu)
        # to 28 x 28
        m = slim.conv2d_transpose(m, 256, 2, stride=2, padding='VALID', activation_fn=tf.nn.relu)
        tf.add_to_collection('__TRANSPOSED__', m)
        m = slim.conv2d(m, num_classes, [1, 1], stride=1, padding='VALID', activation_fn=None)
          
        # add a mask, given the predicted boxes and classes
        outputs['mask'] = {'mask':m, 'cls': classes, 'score': scores}

From https://github.com/CharlesShang/FastMaskRCNN/blob/a70dcdbb16b75f145cba4a5eda92400935ab863b/libs/nets/pyramid_network.py#L606

    # final network visualization
    first_mask = outputs['mask']['mask'][:1]
    first_mask = tf.transpose(first_mask, [3, 1, 2, 0])

    visualize_final_predictions(outputs['final_boxes']["box"], end_points["input"], first_mask)

From https://github.com/CharlesShang/FastMaskRCNN/blob/a70dcdbb16b75f145cba4a5eda92400935ab863b/libs/visualization/summary_utils.py#L37

def visualize_final_predictions(boxes, image, masks):
    visualize_masks(masks, "pred_mask")
    visualize_bb(image, boxes, "final_bb_pred")

Seems like there isn't a sigmoid. I'm not exactly sure how to add it though.

astromme avatar Jul 20 '17 15:07 astromme

Did anyone found a solution to fix the above on mask accuracy? thanks Tets

Tetsujinfr avatar Jul 27 '17 00:07 Tetsujinfr

I thought tf.sigmoid(outputs['mask']['mask']) should do a fine work. However, it did not work when I tried to visualize masks on tensorboard. So I wrote another visualization function using PIL to create .png. It works fine for drawing predicted masks on images during training. If you are interested, you can look at my visualization fork. However, I changed a lot of things in that fork and the code is totally mess.

souryuu avatar Jul 27 '17 23:07 souryuu

Thanks @souryuu , I try your code and it works fine. BTW, you really change a lot :-)

lengly avatar Aug 08 '17 09:08 lengly

@lengly Just wondering if you were able to reproduce any experimental evaluation results from the paper (or obtaining a close performance)? Thanks!

realwecan avatar Aug 08 '17 12:08 realwecan

@realwecan You can try souryuu's code, it'll save some images which make sense. I only try 60k iter. So the result isn't as good as the paper.

lengly avatar Aug 09 '17 02:08 lengly

@souryuu Your fork is truly beneficial thank you very much for your efforts. However I didn't understand why test.py and train.py differ that much by just looking at the code, could you please explain it? Images created by test.py predicts only human labels even they are not humans sometimes. On the contrary, images created by train.py involve various types of labels but with intensive number of bounding box compared to test.py. sooryuquestion

Cangrc avatar Aug 09 '17 23:08 Cangrc

@Cangrc Hi, does your test.py work without any modification? I run train.py for 50k iter. and when I run test.py, there is no mask or bbox. Also, please see here, souryuu said during the first 50k iter, the result biased to person class. But after 600k iter, it works fine.

lengly avatar Aug 10 '17 02:08 lengly

Yes it works without any modification. I think I am somewhere near 200k iteration.

Cangrc avatar Aug 10 '17 06:08 Cangrc

If you use the code in my fork without any modification, your batch normalization should not be properly trained up until now because update_bn in config_v1.py was set to False. Change that to True and train the network again. During test, set all is_training flag in test.py to False. You should see significant improvement.

souryuu avatar Aug 10 '17 09:08 souryuu

@souryuu Thanks heaps! Just wanted to clarify here, in your latest code you have added the IS_TRAINING flag. I can see that when IS_TRAINING==True, update_bn will be set to True (and False otherwise). Am I correct to assume that I can just use this version of code as is, as long as I set IS_TRAINING=True when training, and IS_TRAINING=False during test?

realwecan avatar Aug 10 '17 12:08 realwecan

@realwecan Yes it should be something like that. However, in the latest version on "fix_testing" branch, all "is_training" in train.py and test.py are already set to True and False respectively. BN should work fine without any setting.

souryuu avatar Aug 10 '17 12:08 souryuu

@souryuu Thanks for your prompt reply! Just wondering if you have ever encountered a problem where you see the regular_loss constantly rising. Could this caused by improper settings of the learning rate and the loss weights? I have been experiencing this in a few versions of the code.

realwecan avatar Aug 10 '17 12:08 realwecan

@realwecan Yes I encountered that in the early stage of training. Afterward, it gradually reduced. Setting learning rate and loss weight too high should temporary increase the regular loss. However, I am not sure if it matters. From what I understand,. It is just a loss from a regularization term that helps you not overfitting to the training data.

souryuu avatar Aug 10 '17 13:08 souryuu

@souryuu When testing with the latest code in the fix_testing branch I met the following error:

Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/script_ops.py", line 82, in call ret = func(*args) File "/home/twang/work/segmentation/baselines/FastMaskRCNN/train/../libs/layers/sample.py", line 127, in sample_rpn_outputs_wrt_gt_boxes gt_argmax_overlaps = overlaps.argmax(axis=0) # G ValueError: attempt to get argmax of an empty sequence

This error does not seem to exist in the code in your master branch. Do you have any idea what this might be coming from? Thanks!

realwecan avatar Aug 10 '17 13:08 realwecan

@souryuu I try your latest code in master branch. I set is_training=True and train it for 60k iterations. When I set is_training=False, and run test.py, there're still no bbox & mask. Why? Is it because my iteration is not enough?

lengly avatar Aug 11 '17 02:08 lengly

@realwecan in pyramid_networks line 272, setting only_positive flag in sample_rpn_outputs to False should solve your issue. It is the flag to filter rois with RPN's confidence value < 0.5 for speed up the training. Since at the beginning of training RPN can not provide reliable confidence value, this should be turned off.

@lengy Can you see masks and bounding boxs in images created during training ?

souryuu avatar Aug 11 '17 04:08 souryuu

@souryuu Yes, when training, it works fine. the box & mask in training image are just like Cangrc's result above.

lengly avatar Aug 11 '17 04:08 lengly

@lengly Check if update_bn in config_v1.py is set to True during training. If not, set it to True and train the network again.

souryuu avatar Aug 11 '17 04:08 souryuu

@souryuu Yeah, update_bn is true when training. I use your latest code in master branch, and the only change is per_process_gpu_memory_fraction=0.95, but I think it should not affect result. Did the latest master branch test.py work fine in your side?

lengly avatar Aug 11 '17 05:08 lengly

It works fine for me. During training, did "target" and "predicted" match ? If no, I think your RCNN did not learn well. you can keep RPN and begin training RCNN and mask network by commenting out line 76 and uncomment line 80-125 in train.py. If yes, I recommend you to use the code in fix_testing branch instead. However, you need to retrain the network again since the anchors in both branches do not match.

souryuu avatar Aug 11 '17 05:08 souryuu

@souryuu Yeah they're nearly match. I'll try fix_testing branch. I really appreciate for your help!

lengly avatar Aug 11 '17 06:08 lengly

@Cangrc @realwecan @lengly It would be greatly appreciated if some of you could maybe share the trained weights. It would take a long time for my computer to reach 600k iterations. Thanks!

gengyixuan avatar Aug 16 '17 00:08 gengyixuan

Can you share the trained weights?Thanks very much!! @souryuu @gengyixuan @lengly @realwecan

Yc174 avatar Sep 11 '17 13:09 Yc174

@souryuu I set is_training=True and train it for 300k iterations. When I set is_training=False, and run test.py, the result looks bad, target and predict do not match. What should I do? commenting out line 76 and uncomment line 80-125 in train.py? And then training the code again?

Sutong1115 avatar Sep 17 '17 06:09 Sutong1115

I trained until 400K but the prediction looks very bad. Anyone succeed? Btw, t=it seems we have a new extended version of Mask RCNN on Caffe AffordanceNet. The result looks interesting to me.

trminh89 avatar Oct 05 '17 10:10 trminh89

@souryuu Thanks for your code! I tried your code following the 'README.md' but error happened. It said:"IOError: [Errno 2] No such file or directory: './output/mask_rcnn/est_imgs/train_est_0.jpg' " From train.py I could not see any code about how to generate file 'est_imgs'. So can you tell me if i missed some steps about it? I will appreciate it.

Yiman-GO avatar Dec 04 '17 06:12 Yiman-GO