torchcv icon indicating copy to clipboard operation
torchcv copied to clipboard

Problem on Loss Function [division by zero]

Open ahkarami opened this issue 7 years ago • 4 comments
trafficstars

Dear @kuangliu, In some Cases (on the training process) the num_pos will be equal to 0 (in the ssd_loss.py script). Then an error will be occurred. So to address the issue, I have added the below code:

		...
        if num_pos > 0:
            print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item()/num_pos, cls_loss.item()/num_pos), end=' | ')
            loss = (loc_loss+cls_loss)/num_pos
        else:
            print('Number of Positive Samples is 0.')
            loss = (loc_loss+cls_loss)  # (??? I don't know correct or not)
        return loss

My question is that, in this situations the loss must be equal to zero (as the main paper of SSD mentioned) or a big number? Related Links for above issue: 1- https://github.com/kuangliu/torchcv/issues/16 2- GluonCV-Advanced Notes about SSD Training

ahkarami avatar Jul 07 '18 05:07 ahkarami

OK I had the same problem, didn't think of setting it to zero, I had set N to 1, but that clearly seems the error. SSD paper has made it clear I think.

vaishnavm217 avatar Jul 07 '18 06:07 vaishnavm217

Moreover Assigning zero loss, it is not able to backpropagate.

vaishnavm217 avatar Jul 07 '18 09:07 vaishnavm217

@vaishnavm217, I agree with you there.

ahkarami avatar Jul 08 '18 05:07 ahkarami

@ahkarami @vaishnavm217 what solution works for this?

getsanjeev avatar Jul 18 '19 07:07 getsanjeev