SimiFeat
SimiFeat copied to clipboard
Score function implementation
Hi. Thanks for providing the code to this interesting method you proposed.
I have a question regarding the implementation of the Score function.
In the paper, the score function is defined as the cosine similarity between the soft labels and the one-hot encodings of label j.
However, in the code the following implementation is used for the get_score
function:
https://github.com/UCSC-REAL/SimiFeat/blob/aa37fb5a06aa346daf64e9b180499638a6a86318/hoc.py#L465
Is there something that I am missing? Could you please provide more information about the reasoning behind this implementation? Thank you very much for your time and help.
Looking forward to hearing from you. Best regards.
Later edit: I have another question regarding the example provided in page 4. In this example we have the following information:
- three instances $n_1, n_2, n_3$ with clean labels 1, i.e. $y_{n_1} = y_{n_2} = y_{n_3}$
- the noisy labels of these instances are $\tilde y_{n_1} = \tilde y_{n_3} = 1$ and $\tilde y_{n_2} = 2$. So from this I understand that we have two classes in our toy dataset, i.e. class with label 1 and class with label 2. Moreover by having the clean labels we can say that the instance $n_2$ is corrupted because $\tilde y_{n_2} != y_{n_2}$.
- the soft labels $\hat y_{n_1} = \tilde y_{n_2} = [0.6, 0.4, 0.0]^T$ and $\hat y_{n_3} = [0.34, 0.33, 0.33]^T$. However, in page 3 it is mentioned the following: The i-th element of $\hat y_{n}[i]$ can be interpreted as the estimated probability of predicting class-i.. Therefore, if we have two classes then $\hat y_n$ will have two values. In the example we have two classes (label 1 and label 2) but we have three values in $\hat y_n$. I do not understand why this is the case. Shouldn't we have only two values for each $\hat y_n$? Could you please explain why there are three values instead (i.e. [0.6, 04, 0.0] and [0.34, 0.33, 0.33])? Thank you! (Later edit again: I think I understood this example: we assume that we have three class, but in the example we use instances from class 1 and class 2. However I still don't understand the different score implementation).
Hi @bakachan19 ,
Thank you for being so interested in our paper. For the score function, the 1st input arg of get_score
is knn_labels_cnt
obtained from function count_knn_distribution
, which is a vector whose $l_2$-norm is 1. Then in this Line, we first take log(knn_labels_cnt)
then project it onto $e_{label}$.
To exactly match with the score function mentioned above, we need to change log(knn_labels_cnt)
to knn_labels_cnt.
In this case, $\hat y_n$=knn_labels_cnt
and label
is used as its one-hot encoding by inputting it to F.nll_loss.
The returned value is the negative similarity according to the definition of NLL_loss.
Please feel free to ask if you have more questions!
Best, Zhaowei
Hi. Thanks for providing the code to this interesting method you proposed.
I have a question regarding the implementation of the Score function. In the paper, the score function is defined as the cosine similarity between the soft labels and the one-hot encodings of label j.
However, in the code the following implementation is used for the
get_score
function:https://github.com/UCSC-REAL/SimiFeat/blob/aa37fb5a06aa346daf64e9b180499638a6a86318/hoc.py#L465
Is there something that I am missing? Could you please provide more information about the reasoning behind this implementation? Thank you very much for your time and help.
Looking forward to hearing from you. Best regards.
Later edit: I have another question regarding the example provided in page 4. In this example we have the following information:
- three instances $n_1, n_2, n_3$ with clean labels 1, i.e. $y_{n_1} = y_{n_2} = y_{n_3}$
- the noisy labels of these instances are $\tilde y_{n_1} = \tilde y_{n_3} = 1$ and $\tilde y_{n_2} = 2$. So from this I understand that we have two classes in our toy dataset, i.e. class with label 1 and class with label 2. Moreover by having the clean labels we can say that the instance $n_2$ is corrupted because $\tilde y_{n_2} != y_{n_2}$.
- the soft labels $\hat y_{n_1} = \tilde y_{n_2} = [0.6, 0.4, 0.0]^T$ and $\hat y_{n_3} = [0.34, 0.33, 0.33]^T$. However, in page 3 it is mentioned the following: The i-th element of $\hat y_{n}[i]$ can be interpreted as the estimated probability of predicting class-i.. Therefore, if we have two classes then $\hat y_n$ will have two values. In the example we have two classes (label 1 and label 2) but we have three values in $\hat y_n$. I do not understand why this is the case. Shouldn't we have only two values for each $\hat y_n$? Could you please explain why there are three values instead (i.e. [0.6, 04, 0.0] and [0.34, 0.33, 0.33])? Thank you! (Later edit again: I think I understood this example: we assume that we have three class, but in the example we use instances from class 1 and class 2. However I still don't understand the different score implementation).
For the follow-up question: Yes there are three classes.
Hi @zwzhu-d.
Thank you for taking the time and getting back to me.
I am sorry, but I am not sure I follow your explanation:
To exactly match with the score function mentioned above, we need to change log(knn_labels_cnt) to knn_labels_cnt.
Does this mean that the current implementation does not follow exactly the score function described in the paper?
Could you try to rephrase the explanation?
Thank you, and sorry for bothering you!
Hi @zwzhu-d.
Thank you for taking the time and getting back to me. I am sorry, but I am not sure I follow your explanation:
To exactly match with the score function mentioned above, we need to change log(knn_labels_cnt) to knn_labels_cnt.
Does this mean that the current implementation does not follow exactly the score function described in the paper? Could you try to rephrase the explanation?Thank you, and sorry for bothering you!
Sorry for the late reply. I thought I have replied to you.
Yes, if we take the log, it is not exactly the score function described in the paper. It can be seen as the cosine similarity of the log probability, while the paper only mentioned the cosine similarity of the probability. But the performance should be similar. Thank you for your question!