mx-maskrcnn icon indicating copy to clipboard operation
mx-maskrcnn copied to clipboard

Can't understand the backward of mask_output.

Open zhixinwang opened this issue 8 years ago • 3 comments

After sigmoid operator, the mask loss should be binary cross-entropy loss but ignore the loss of background mask. But in the code, https://github.com/TuSimple/mx-maskrcnn/blob/c866544cb2ab93f868bce004e86af7a93451d977/rcnn/PY_OP/mask_output.py#L27 In common, it will like ( mask_prob - mask_target ) / ( mask_prob (1 - mask_prob) ) * mask_weight/float(n_rois_fg) I guess you have made the sigmoid and binary cross-entropy together when computing backward. what's more, the factor 0.25 also confuse. :)

zhixinwang avatar Nov 15 '17 03:11 zhixinwang

Hi, @zhixinwang Sorry for my late reply. Thank you for prompt this, I do make the sigmoid and binary cross-entropy together when computing backward. I think it will harm the gradient since there are two sigmoid. I will check it soon. The factor is used to control the weight of the mask branch.

Zehaos avatar Nov 16 '17 13:11 Zehaos

In the current code , the gradient of the mask loss is still: self.factor * mask_weight * (mask_prob - mask_target) / loat(n_rois_fg), and the factor is 1.0/32 In my opinion, it can be viewed as the gradient of least square loss, yes? @Zehaos @zhixinwang

gaosanyuan avatar Feb 27 '18 11:02 gaosanyuan

@Zehaos , I tried to seperate the sigmoid and binary cross-entropy together when computing backward, which is: grad = self.factor *( mask_prob - mask_target ) / ( mask_prob (1 - mask_prob) ) * mask_weight/float(n_rois_fg), but maskLogLoss=Nan immediately . As I understand it, the gradient after this change is only a little bigger than the original you wrote, so I tried to decrease self.factor or learning -rate, but MaskLogLoss still Nan, so what is the problem with it ? and why you make the sigmoid and binary cross-entropy together ?

HeroyiuWFY avatar May 03 '18 11:05 HeroyiuWFY