keras-yolo3
keras-yolo3 copied to clipboard
about loss function:confidence_loss
Keras-yolo3/yolo3/model.py line 401 confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask
I want to know what (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask mean。
And I have printed [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], notice that the confidence_loss is so large, and xy_loss, wh_loss is relatively small, can someone explain?
Finally, how do I calculate K.sum(ignore_mask)。
Thanks!
same issue 同问,对confidence_loss不是很理解
According to my understanding: Confidence loss split into 2 parts:
- For object grids: object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)
- For non-object grids: (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask In 2, the ignore mask = 0 for those grids whose prediction box overlap with an GT box with IOU larger than 0.5, those grids are (half positive half negative) so got ignored in the confidence loss calculation.
For your observation that confidence loss overwhelms other losses, I don't think it cause a problem. First your over all loss is still very high. You can containing training and see what happens. Second, I think the regression error is less important when the confidence loss is high. If your model cannot correctly discriminate objects and background, the regressed bbox is meaningless.
@taijizhao you are right, I see, thank you very much.
ignore_mask is 1 means the prediction in this image is not so good (because best_iou < ignore_thresh) Only in this case calculate the Loss of negative samples,that means the model should pay more attention on Identifying whether each part of the image is the target or not once ignore_mask is 0( best_iou >ignore_thresh), that means the model can identify object well, so now we need the model pay more attention on correct the box(stop calculating negative loss) This is just the author's strategy(different with fasterrcnn or ssd),but on an experimental basis