spotlight icon indicating copy to clipboard operation
spotlight copied to clipboard

LightFM sample_weight equivalent in Spotlight?

Open nikisix opened this issue 7 years ago • 4 comments

Hi @maciejkula, Is there a LightFM sample_weight equivalent in Spotlight? The scenario is there are two types of ratings in the problem:

  1. Explicit (user has actually bought the item)
  2. Implicit (user has searched for item, but not purchased)

I believe this is good cause to utilize sample_weights, but please correct me if I'm wrong. For folks stumbling upon this, sample weight in LightFM is defined as:

sample_weight: np.float32 coo_matrix of shape [n_users, n_items], optional
     matrix with entries expressing weights of individual
     interactions from the interactions matrix.
     Its row and col arrays must be the same as
     those of the interactions matrix. For memory
     efficiency its possible to use the same arrays
     for both weights and interaction matrices.
     Defaults to weight 1.0 for all interactions.
     Not implemented for the k-OS loss.

Searching Spotlight didn't reveal interaction weights used anywhere except in cross_validation.py like: weights=_index_or_none(interactions.weights...

Whereas I was hoping to see something more like LightFM's _lightfm_fast.pyx.template: loss = weight * (prediction - y)

Is there a reason this is missing? How hard would it be to add to the ImplicitFactorization and ImplicitSequence models?

nikisix avatar Jul 10 '18 13:07 nikisix

Unfortunately, I think you're right: sample weights are currently not used. However, it should be relatively easy (if perhaps a little tedious) to add this functionality: we'd simply multiply the loss by the weight. Is this something you'd like to try out?

Roughly speaking, the following would be involved:

  1. If weights are present, make sure they are transformed into sequences when transforming Interactions into SequenceInteraction (roughly here: https://github.com/maciejkula/spotlight/blob/master/spotlight/interactions.py#L251)
  2. For both factorization and sequence models, iterate over weights and interactions in lockstep when doing minibatch iteration (https://github.com/maciejkula/spotlight/blob/master/spotlight/sequence/implicit.py#L225, https://github.com/maciejkula/spotlight/blob/master/spotlight/factorization/implicit.py#L223)
  3. Amend loss functions to take weights as optional arguments.
  4. Inside the loss functions, multiply losses with weights.

maciejkula avatar Jul 11 '18 12:07 maciejkula

Ok, I'll take a crack at it; might need some pointers with the first commit or two.

nikisix avatar Jul 11 '18 20:07 nikisix

So, the weights are not applied to the current version of Spotlight (0.1.5)? Precisely, what's the role of weights in the Interaction class?

amirj avatar Oct 24 '18 09:10 amirj

@amirj You may use my PR if you'd like. Weights aren't used even if specified, until the PR is merged. https://github.com/maciejkula/spotlight/pull/122

nikisix avatar Oct 24 '18 16:10 nikisix