lightfm icon indicating copy to clipboard operation
lightfm copied to clipboard

Question regarding how preferences are modeled with BPR loss

Open julioasotodv opened this issue 4 years ago • 1 comments

Hi all!

First of all, thank you so much for the effort in this lib. Works really well :)

However, while using it with some implicit data (in fact, count data) I had a question on how the training is done with BPR loss. Let me explain:

As you already know: for modelling preferences with BPR loss, a pairwise loss is used; so each training instance is a triplet in the form of (user, item1, item2); where the model learns the preference between the two items for that specific user.

Imagine that I have an implicit dataset with play counts for movies (how many times each user has watched a specific movie). To make things easier, let's imagine that I only have a single user, and the following interaction data:

  • Movie A, watched 8 times
  • Movie B, watched 2 times
  • Movie C, watched 2 times
  • Movie D, watched 0 times (this is, no interaction)

I was wondering how the training triplets would look like with such dataset. We know the preference for all possible triplet combinations, except for the one between B and C (as their play count is exactly the same; so we cannot infer any preference between them).

However by skimming through the source code, I would say that the way the negative sampling works in LightFM with BPR loss would yield only the following training triplets:

  • (user, Movie A, Movie D)
  • (user, Movie B, Movie D)
  • (user, Movie C, Movie D)

Which is correct (as BPR assumes preference on all watched movies over the unwatched one). However, the model would me missing some valuable information: our user prefers Movie A over B and C also (as he/she has watched it way more times).

But for what I have seen in the source code, non-zero items are only compared against zero (unwatched) ones, therefore missing the prefer A over B/D training instances.

Is that right? If that is the case, would you be interested in a contribution to include those additional preferences as part of the training procedure?

Also, does it work the same way for WARP loss?

Thanks a ton!

julioasotodv avatar Jun 25 '21 15:06 julioasotodv

Ok some updates on this:

I realized that you can include sample weights on the interactions data (thus, effectively on the sampled triplets as the weights for the zero-valued negative samples are just ignored, for obvious reasons).

To some extent, this allows the model to learn the higher importance of the preference between an specific item (say, movie A following the previous example) and the zero ones, over movies B/C and the zero ones; which is nice since it is a proxy of interaction/preference strength between items.

I would say that in a large dataset, this instance weighting should be enough to learn the relative higher/lower preferences agmonst items for an specific user. However, I would say that the explicit training on preferences between nonzero items is still valuable info for the model (specially in mid-sized datasets).

This last point is only a hypothesis from my side, as it cannot be demonstrated analytically and would need experimentation and testing. That is why I think this feature could be valuable.

Thank you!

julioasotodv avatar Jun 28 '21 08:06 julioasotodv