keras-yolo3 icon indicating copy to clipboard operation
keras-yolo3 copied to clipboard

about loss function:confidence_loss

Open jinbooooom opened this issue 6 years ago • 4 comments

Keras-yolo3/yolo3/model.py line 401 confidence_loss = object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)+ (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask

I want to know what (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask mean。 And I have printed [loss, xy_loss, wh_loss, confidence_loss, class_loss, K.sum(ignore_mask)], notice that the confidence_loss is so large, and xy_loss, wh_loss is relatively small, can someone explain? Finally, how do I calculate K.sum(ignore_mask)。 Thanks! 2 2 3

jinbooooom avatar Nov 13 '18 09:11 jinbooooom

same issue 同问,对confidence_loss不是很理解

LewX avatar Nov 15 '18 06:11 LewX

According to my understanding: Confidence loss split into 2 parts:

  1. For object grids: object_mask * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True)
  2. For non-object grids: (1-object_mask) * K.binary_crossentropy(object_mask, raw_pred[...,4:5], from_logits=True) * ignore_mask In 2, the ignore mask = 0 for those grids whose prediction box overlap with an GT box with IOU larger than 0.5, those grids are (half positive half negative) so got ignored in the confidence loss calculation.

For your observation that confidence loss overwhelms other losses, I don't think it cause a problem. First your over all loss is still very high. You can containing training and see what happens. Second, I think the regression error is less important when the confidence loss is high. If your model cannot correctly discriminate objects and background, the regressed bbox is meaningless.

taijizhao avatar Nov 15 '18 07:11 taijizhao

@taijizhao you are right, I see, thank you very much.

jinbooooom avatar Nov 20 '18 03:11 jinbooooom

ignore_mask is 1 means the prediction in this image is not so good (because best_iou < ignore_thresh) Only in this case calculate the Loss of negative samples,that means the model should pay more attention on Identifying whether each part of the image is the target or not once ignore_mask is 0( best_iou >ignore_thresh), that means the model can identify object well, so now we need the model pay more attention on correct the box(stop calculating negative loss) This is just the author's strategy(different with fasterrcnn or ssd),but on an experimental basis

grapefruitL avatar Sep 04 '20 08:09 grapefruitL