tensorflow-triplet-loss icon indicating copy to clipboard operation
tensorflow-triplet-loss copied to clipboard

Single Class Multiple Clusters

Open TilakD opened this issue 6 years ago • 4 comments

Hi @omoindrot I am utilizing your foundation code on a custom dataset and I'm getting multiple clusters for same class when used tsne to visualize. My embeddings are 128 dimensions. Am I doing something wrong or there might be a single cluster for each class and when dimension is reduced it is moving into 3 different cluster??

image

TilakD avatar Feb 18 '19 08:02 TilakD

Hi @TilakD

It's pretty weird indeed, maybe this is because of your data? For instance maybe you have data coming from three different sources (ex: grayscale images, RGB images and another type), so the embeddings are naturally clustered by type before class.

It's also possible that t-SNE is not perfectly clustering the data? See this paper for more on tSNE: https://distill.pub/2016/misread-tsne/

I would plot the different images in each cluster for a single class to understand what differentiates them.

omoindrot avatar Feb 27 '19 22:02 omoindrot

Hi @omoindrot Thanks for the reply.

All the data are coming from the same source (RGB images). 7 classes contains combination of 3 different subject images. 3 clusters for each class indicate 3 subjects.

When I check intra cluster distance in 128 dimension, I'm getting very low value for each class. When I do the same in 2D/3D after tsne, intra cluster distance in huge. I confused as to why tsne is considering features of subjects along with features of classes.

Please let me know your thoughts.

TilakD avatar Mar 04 '19 10:03 TilakD

I'm not sure what your exact data is, but consider this (related?) example: you have 3 people, and you ask them to take 7 different poses (standing up, sitting...).

Now you train embeddings with triplet loss according to the 7 poses.

Of course the embeddings will also reflect the 3 different people you use, because by default their embeddings will be different. So even if you train perfectly with triplet loss, each cluster will likely contain 3 different sub-clusters.

Even in face recognition, the cluster of a person can contain clusters (one where the person wears glasses, one where the person is older...).

omoindrot avatar Mar 04 '19 14:03 omoindrot

Thanks a lot @omoindrot. Got my doubts clarified!

TilakD avatar Mar 05 '19 10:03 TilakD