ssd training loss
Hi Jasper,
On the faster r-cnn model, the training loss starts at 4. on the same dataset, why is it starting from 7 on ssd model?
The two architectures use different methods for calculating the loss, so you can't compare the values between them. Faster RCNN uses a smoothed L1 norm loss on the box locations while SSD uses an L2 norm. The total loss for both is a weighted combination of classification loss and position loss, in SSD this has an extra hyperparameter (alpha) which you need to tune.
I think SSD typically has higher loss value than Faster RCNN for the same dataset from my experience. I am not sure this can be ported directly to better performance. Your difference in losses is minimal I have encounters quite big differences especially in the final values.