[QUESTION] Chapter 9 Label propagation
I dont get these lines of the code
k=50
y_representative_digits = np.array([4, 8, 0, 6, 8, 3, ..., 7, 6, 2, 3, 1, 1])
y_train_propagated = np.empty(len(X_train), dtype=np.int32)
for i in range(k):
y_train_propagated[kmeans.labels_==i] = y_representative_digits[i]
If for example the i of the for loop is 2. Then the indices of y_train_propagated[kmeans.labels_==i] (where kmeans.labels_==2 is true) will be set to 0 because y_representative_digits[2] is equivalent to 0, right?
So the indices of y_train_propagated where kmeans.labels_ is equal to 2 will be set to zero. The label is 2 but it is set to 0. Wouldn't that be wrong?
Hi @FatihMercan61 ,
Thanks for your question! There are two types of labels here: class labels, and cluster labels. An image's class label corresponds to the digit that this image represents: it's a number from 0 to 9. An image's cluster label is the ID of the cluster that the image belongs to (in this case, a number from 0 to 49, since there are 50 clusters). Since there are multiple ways of writing any digit, there will be multiple clusters for each digit.
After grouping the images into 50 clusters, and finding the most representative image of each cluster, we manually look at each of these 50 most representative images and we write down their class labels. This gives us the array y_representative_digits, which contains 50 class labels. For example, the first cluster (at index 0) corresponds to a digit 4, the second (at index 1) corresponds to an 8, etc.
Then we want to propagate these class labels to every image in their corresponding clusters. So y_train_propagated will be an array of class labels, with one class label per image in the training set.
So when we iterate over k, we are iterating over clusters, not classes.
Therefore kmeans.labels_==i finds all images in the ith cluster. For all the images in this cluster, we want to use the same class label as the representative image of that cluster: y_representative_digits[i].
Hope this helps!
Hi ageron,
Thank you very much! Now I get it
Hi @FatihMercan61 , Thanks for your question! There are two types of labels here: class labels, and cluster labels. An image's class label corresponds to the digit that this image represents: it's a number from 0 to 9. An image's cluster label is the ID of the cluster that the image belongs to (in this case, a number from 0 to 49, since there are 50 clusters). Since there are multiple ways of writing any digit, there will be multiple clusters for each digit. After grouping the images into 50 clusters, and finding the most representative image of each cluster, we manually look at each of these 50 most representative images and we write down their class labels. This gives us the array
y_representative_digits, which contains 50 class labels. For example, the first cluster (at index 0) corresponds to a digit 4, the second (at index 1) corresponds to an 8, etc. Then we want to propagate these class labels to every image in their corresponding clusters. Soy_train_propagatedwill be an array of class labels, with one class label per image in the training set. So when we iterate overk, we are iterating over clusters, not classes. Thereforekmeans.labels_==ifinds all images in theith cluster. For all the images in this cluster, we want to use the same class label as the representative image of that cluster:y_representative_digits[i].Hope this helps!
awesome explanation, thank you a lot!