Common Resnet gives bad result.
Hi authors, I have tried your network structure with sphere loss and it can give similar result which mentioned in your paper. However, when I switched to use common resnet-50 or resnet-101 I can just got LFW score at about 97.xx%. Did you ever tried any other network structures?
I thought that the problem was caused by small norm(fx) where fx is the 512-dim embedding. As the W is normalized to one in sphereface, fx*W will become very small if norm(fx) is small to affect the total loss value. According to this I tried two approaches:
- Normalize fx to a larger value like 32,
- Use Prelu everywhere and set wd of the bias var in last FC layer to 0 and also remove the last BN layer.
Neither of them can work. Do you have any idea or give some hints?
Maybe you can replace the ReLU with PReLU. It will very likely help.