opensphere icon indicating copy to clipboard operation
opensphere copied to clipboard

How to train SphereFace2 on >1m identities, e.g. WebFace24M?

Open yiminglin-ai opened this issue 2 years ago • 1 comments
trafficstars

Hi @wy1iu @ydwen Thank you for open-sourcing this repo. Section 2.4 of the Sphereface2 paper says

the gradient computations in SphereFace2 are class-independent and can be performed locally within one GPU. Thus no communication cost is needed.

But when you distribute classifiers $W_i$ to different GPUs, what if a GPU only gets a batch of negative features? The lines after one_hot cannot be executed.

Have you tried training on WebFace42M? What is the performance of Sphereface2?

Happy to discuss the implementation details and contribute to this repo :)

yiminglin-ai avatar Jan 31 '23 11:01 yiminglin-ai

There is no problem for SF2 to get a batch of negative features. The final loss will be averaged across all gpus.

"The lines after one_hot cannot be executed." In this case, we will construct a zero matrix as labels.

I haven't tried WebFace42M yet, since it requires too much computational resources. I am looking forward to seeing how SF2 works on this dataset.

ydwen avatar Mar 09 '23 16:03 ydwen