AdaFace icon indicating copy to clipboard operation
AdaFace copied to clipboard

Are WebFace4M and WebFace12M are the same as WebFace260M or WebFace42M? Or are they different datasets? Is it worth to try trying using WebFace42M?

Open martinenkoEduard opened this issue 1 year ago • 2 comments

Are WebFace4M and WebFace12M are the same as WebFace260M or WebFace42M? Or are they different models?

martinenkoEduard avatar Apr 28 '23 14:04 martinenkoEduard

yes they are the same, according to their paper they collected 260M images (50 TB), however the entire dataset WebFace260M is not clean they are just raw from the internet, they are noisy, so they created clean subsets which are Webface42M the largest dataset (as far as I know) to train for FR, and WebFace12M and WebFace4M, these subsets are clean and ready to train on. The large noisy one WebFace260M is an interesting research problem on its own.

models that are trained on the different subsets give different performances, the best performances are usually associated with more data ie webface42m.

xxiMiaxx avatar May 01 '23 11:05 xxiMiaxx

yes they are the same, according to their paper they collected 260M images (50 TB), however the entire dataset WebFace260M is not clean they are just raw from the internet, they are noisy, so they created clean subsets which are Webface42M the largest dataset (as far as I know) to train for FR, and WebFace12M and WebFace4M, these subsets are clean and ready to train on. The large noisy one WebFace260M is an interesting research problem on its own.

models that are trained on the different subsets give different performances, the best performances are usually associated with more data ie webface42m.

So it is worth trying to retrain adaFace on Webface42M, because it is bigger than WebFace12M?

martinenkoEduard avatar May 03 '23 05:05 martinenkoEduard