torchcv
torchcv copied to clipboard
Problem on Loss Function [division by zero]
Dear @kuangliu,
In some Cases (on the training process) the num_pos will be equal to 0 (in the ssd_loss.py script).
Then an error will be occurred. So to address the issue, I have added the below code:
...
if num_pos > 0:
print('loc_loss: %.3f | cls_loss: %.3f' % (loc_loss.item()/num_pos, cls_loss.item()/num_pos), end=' | ')
loss = (loc_loss+cls_loss)/num_pos
else:
print('Number of Positive Samples is 0.')
loss = (loc_loss+cls_loss) # (??? I don't know correct or not)
return loss
My question is that, in this situations the loss must be equal to zero (as the main paper of SSD mentioned) or a big number? Related Links for above issue: 1- https://github.com/kuangliu/torchcv/issues/16 2- GluonCV-Advanced Notes about SSD Training
OK I had the same problem, didn't think of setting it to zero, I had set N to 1, but that clearly seems the error. SSD paper has made it clear I think.
Moreover Assigning zero loss, it is not able to backpropagate.
@vaishnavm217, I agree with you there.
@ahkarami @vaishnavm217 what solution works for this?