recommenders
recommenders copied to clipboard
[Question]Retrieval with CategoricalCrossentropy really minimizing the affinity between the query and negative candidates?
In tfrs.tasks.retireval document, retrieval task is explained like
The main argument are pairs of query and candidate embeddings: the first row of query_embeddings denotes a query for which the candidate from the first row of candidate embeddings was selected by the user.The task will try to maximize the affinity of these query, candidate pairs while minimizing the affinity between the query and candidates belonging to other queries in the batch.
The default loss function of tfrs.tasks.Retrieval is tf.keras.losses.CategoricalCrossentropy. But in CategoricalCrossentropy, the loss of label 0 candidate become 0.
So,
- if I use CategoricalCrossentropy for retrieval loss function, the label 1 candidate will affect loss value and embeddings, but the label 0 candidate will not.Is that right?
- if I set num_hard_negatives argument with CategoricalCrossentropy loss, the number of negative(label 0)candidates will decrease, but the loss value will not change.Is that right?
Yes, it is minimizing it. The cross-entropy loss works through two channels:
- The affinity score between the query and the positive item is the numerator of the softmax, and is maximized.
- The affinity score between the query and the negative item is in the denominator, and will be minimized.
@maciejkula Can you elaborate a bit more here? are we back propagating through negative items as well? so if a negative item is frequently selected with a user (i.e. in batch negatives is biased towards popular items) are these items get far away from the user representation after many epochs? or am i missing sth
Thanks
@OmarMAmin,
This is true in the case where a log(q) correction is not applied.
It turns out that the bias introduced by using in-batch negative sampling can be accounted for by subtracting the natural logarithm of the candidate sampling probability from the output logits.
This is implemented in Retrieval task through the call parameter candidate_sampling_probability.
You can read the theoretical details of sampled softmax loss in this paper.
There are also some useful discussions in this thread. https://github.com/tensorflow/recommenders/issues/257
@patrickorlando thanks for the info, tried using it and it performed better now with the sampling correction :))
Thanks for your invaluable contributions, without these discussions I would have switched to another library directly, will be summarizing the learnings here in another issue for others to benefit from it.
I'm glad I could help @OmarMAmin 😁
That sounds like a great idea.