insightface icon indicating copy to clipboard operation
insightface copied to clipboard

Training to maximise Face Clustering performance

Open atoaster opened this issue 3 years ago • 15 comments

Hi

I'm looking to do some unsupervised face clustering with the face vectors produced by insightface. I have trained on a custom dataset (ms1m + 10,000 of my own images) and have tried clustering with both HDBSCAN and DBSCAN (using cosine distance as a metric), but seem to have much lower performance than dlib + chinese whisper clustering.

I have trained insightface with embedding_size = 256 to reduce complexity (hdbscan performs poorly in high dimensions), but am still getting poor results (0 clusters, 100s of clusters when there should be 10)

Has anyone tried training insightface to perform face clustering? If so, are there any steps that I can do to improve the performance?

atoaster avatar Jan 19 '22 04:01 atoaster

Have you tried Chinese whispers with original insightface models? I have tried it some time ago and I recall it was working pretty well.

SthPhoenix avatar Jan 19 '22 16:01 SthPhoenix

Hi @SthPhoenix!

I will try it now and report back. My issue with chinese whisper clustering is that the python implementation is a fairly slow, but will try it nonetheless.

atoaster avatar Jan 19 '22 22:01 atoaster

Well, I can't seem to get any reasonable clustering done on my dataset using insightface and chinese whispers. It constantly gives everything its own cluster.

Eg 26161 images were put into 26159 clusters (using dlibs chinese whisper clustering)

Perhaps my dataset is just too difficult to cluster

atoaster avatar Jan 20 '22 00:01 atoaster

Are you using normalized embeddings for clustering?

SthPhoenix avatar Jan 20 '22 03:01 SthPhoenix

Yes, I used sklearn for normilzation (is there a better way?)

Here is the dataset if it helps (already aligned/resized to 112) - it includes masked people, but I've had some luck with other recognition methods. Was just hoping for an improvement.

atoaster avatar Jan 20 '22 04:01 atoaster

Link is broken

SthPhoenix avatar Jan 20 '22 05:01 SthPhoenix

For some reason now it's working ) I have tried your dataset with dlib Chinese whispers with threshold 0.85, it gives around 2500 clusters with a lot of outliers - clusters containing 1-5 faces. If you filter your dataset by embedding norm > 20 and remove all clusters below 5 faces it should be around 65 clusters. A bit modified python implementation from FaceNet repo gives around 350 clusters without any filtering, and something around 30 after, but tooks about 40-50 minutes to run.

I have tested both glintr100 and w600k_r50 embeddings, they perform almost identical, which is great since w600k_r50 is almost twice faster recognition model.

BTW, while looking through images in largest cluster, I have noticed this image: 61d53758abf8dae5f3ffecf8 And I just can't stop thinking about it, what the hell have happened there? )))

SthPhoenix avatar Jan 21 '22 16:01 SthPhoenix

@SthPhoenix

Link is broken For some reason Dropbox sent me an email saying that they would take the link down. Good to hear it is now working!

Thanks so much for taking the time to look into this!

Ahahahaha, I'm unsure where the images are from, I've just collected them by running RetinaFace over hours of CCTV footage.

What do you mean by If you filter your dataset by embedding norm > 20 ?

The clusters produced by facenet appear kinda noisy, I was trying to improve on those. Here is a UMAP projection of the FaceNet clusters from this dataset

image

As you can see, the orange in the top-right is separated into 2-3 clusters, even though it is the same person. I will try projecting the chinese whispers clusters as well, see what I get.

atoaster avatar Jan 23 '22 22:01 atoaster

What do you mean by If you filter your dataset by embedding norm > 20 ?

You can normalize embedding following way:

embedding_norm = np.linalg.norm(embedding)
normed_embedding = embedding / embedding_norm

Embedding norm might be used as additional face quality metric, since it's lower for crops having less meaningful features.

SthPhoenix avatar Jan 24 '22 04:01 SthPhoenix

@SthPhoenix Sorry for abandoning this post for so long!

I'm a little confused about the embedding norm, are you saying to filter with embedding_norm > 20, and then use normed_embedding for clustering? The issue with that is that I need to build a classifier on the clustered data, so I would need to store the embedding_norm and apply it to any input features (which might not be a real issue)

atoaster avatar May 13 '22 01:05 atoaster

@SthPhoenix Sorry for abandoning this post for so long!

I'm a little confused about the embedding norm, are you saying to filter with embedding_norm > 20, and then use normed_embedding for clustering? The issue with that is that I need to build a classifier on the clustered data, so I would need to store the embedding_norm and apply it to any input features (which might not be a real issue)

You can just store normed_embedding and use it for classifying instead of original

SthPhoenix avatar May 13 '22 03:05 SthPhoenix

@SthPhoenix Sorry for abandoning this post for so long! I'm a little confused about the embedding norm, are you saying to filter with embedding_norm > 20, and then use normed_embedding for clustering? The issue with that is that I need to build a classifier on the clustered data, so I would need to store the embedding_norm and apply it to any input features (which might not be a real issue)

You can just store normed_embedding and use it for classifying instead of original

But then how would you classify newly incoming points? Would you also have to use np.linalg.norm on new point before passing into a trained classifier?

atoaster avatar May 13 '22 05:05 atoaster

But then how would you classify newly incoming points? Would you also have to use np.linalg.norm on new point before passing into a trained classifier?

Exactlу! Though that's not that scary as you may think, it'll take some excessive time of course, but it's neglectably low.

SthPhoenix avatar May 13 '22 14:05 SthPhoenix

I think I'm having some seemingly random issues with clustering insightface features. As you can see in the below image, sometimes the clustering is very good:

image

But then other times the clustering is very wrong:

image

My initial hypothesis is that there is an outlier in the feature vector space (*.npz here if you're interested) that is causing the normalization to skew all of the features, but I'm not sure if that's entirely a possibility (is that even how normalization works?)

EDIT: Nevermind, the issue persists even when I don't normalize...

atoaster avatar May 15 '22 23:05 atoaster

Normalization should not influence clustering, have you tried manually inspecting those clusters? Possibly in such cases input data is of lower quality.

SthPhoenix avatar May 16 '22 04:05 SthPhoenix

I think I'm having some seemingly random issues with clustering insightface features. As you can see in the below image, sometimes the clustering is very good:

image

But then other times the clustering is very wrong:

image

My initial hypothesis is that there is an outlier in the feature vector space (*.npz here if you're interested) that is causing the normalization to skew all of the features, but I'm not sure if that's entirely a possibility (is that even how normalization works?)

EDIT: Nevermind, the issue persists even when I don't normalize...

Hi @atoaster have you fixed this problem? And the above mentioned "The clusters produced by facenet ": does it refer to chinese whisper algo? Thank you~

Jar7 avatar Oct 11 '22 09:10 Jar7

Hi @Jar7

Basically I have resolved the problem by using regular normalization and HDBSCAN for clustering. I have also removed my custom 256D model and instead just used the default glint360K 512D model provided by insightface. It will occasionally still have issues, but all in all it is definitely far more accurate what I had been trying above!

atoaster avatar Oct 20 '22 22:10 atoaster