Can't understand the backward of mask_output.
After sigmoid operator, the mask loss should be binary cross-entropy loss but ignore the loss of background mask. But in the code,
https://github.com/TuSimple/mx-maskrcnn/blob/c866544cb2ab93f868bce004e86af7a93451d977/rcnn/PY_OP/mask_output.py#L27
In common, it will like
( mask_prob - mask_target ) / ( mask_prob (1 - mask_prob) ) * mask_weight/float(n_rois_fg)
I guess you have made the sigmoid and binary cross-entropy together when computing backward. what's more, the factor 0.25 also confuse. :)
Hi, @zhixinwang Sorry for my late reply. Thank you for prompt this, I do make the sigmoid and binary cross-entropy together when computing backward. I think it will harm the gradient since there are two sigmoid. I will check it soon. The factor is used to control the weight of the mask branch.
In the current code , the gradient of the mask loss is still: self.factor * mask_weight * (mask_prob - mask_target) / loat(n_rois_fg), and the factor is 1.0/32 In my opinion, it can be viewed as the gradient of least square loss, yes? @Zehaos @zhixinwang
@Zehaos , I tried to seperate the sigmoid and binary cross-entropy together when computing backward, which is: grad = self.factor *( mask_prob - mask_target ) / ( mask_prob (1 - mask_prob) ) * mask_weight/float(n_rois_fg), but maskLogLoss=Nan immediately . As I understand it, the gradient after this change is only a little bigger than the original you wrote, so I tried to decrease self.factor or learning -rate, but MaskLogLoss still Nan, so what is the problem with it ? and why you make the sigmoid and binary cross-entropy together ?