opensphere
opensphere copied to clipboard
How to train SphereFace2 on >1m identities, e.g. WebFace24M?
Hi @wy1iu @ydwen Thank you for open-sourcing this repo. Section 2.4 of the Sphereface2 paper says
the gradient computations in SphereFace2 are class-independent and can be performed locally within one GPU. Thus no communication cost is needed.
But when you distribute classifiers $W_i$ to different GPUs, what if a GPU only gets a batch of negative features? The lines after one_hot cannot be executed.
Have you tried training on WebFace42M? What is the performance of Sphereface2?
Happy to discuss the implementation details and contribute to this repo :)
There is no problem for SF2 to get a batch of negative features. The final loss will be averaged across all gpus.
"The lines after one_hot cannot be executed." In this case, we will construct a zero matrix as labels.
I haven't tried WebFace42M yet, since it requires too much computational resources. I am looking forward to seeing how SF2 works on this dataset.