spotlight icon indicating copy to clipboard operation
spotlight copied to clipboard

Getting multiple predictions seems broken

Open elanmart opened this issue 7 years ago • 6 comments

I'm playing with implicit models using default BilinearNet as representation.

Given interactions test and some model model, one would expect

model.predict(test.user_ids)

will work, but it raises

RuntimeError: The expanded size of the tensor (<num_users>) must match the existing size (<...>) at non-singleton dimension 0

I think fixing this would require changing the way spotlight generates predictions. Currently when a we want predictions for user 7 and items [1, 2, 3], we actually call

model._net([7, 7, 7], [1, 2, 3])

To scale this to multiple users, e.g. users [7, 8] we could

  1. Generate a tensor [[7,7,7], [8,8,8]] for user_ids, and call _net() as usual (some unsqueeze on item_embeddings would be needed for broadcasting)
  2. Have a BilinearNet.predict_all method that would compute
x = th.LongTensor([7, 8])
y = th.LongTensor([1, 2, 3])

self.user_embeddings(x) @ self.item_embeddings(y).t()
  1. Use torch.bmm() which, depending on the shape of the user_ids either computes equivalent 1. or 2.

I believe 2. is the cleanest and should be also the fastest. @maciejkula what do you think?

elanmart avatar Feb 02 '18 01:02 elanmart

I think this is basically a feature request, something that the library doesn't do at the moment (rather than a bug).

I'd be happy to have a think about doing this provided that

  1. You'd be happy to apply it consistently to all models.
  2. The result doesn't complicate the code much.

One pointer for opening issues: it may be nicer to the maintainer if you don't start by assuming something is broken when it doesn't behave exactly as you want.

maciejkula avatar Mar 08 '18 21:03 maciejkula

I'm sorry, I didn't intend this to sound offensive. It seemed broken, since doing something quite intuitive resulted in an obscure pytorch error, but I agree I'm at fault here.

I can try to add this feature in the near future, but for now perhaps it would be nice to add a helpful error message when user_ids is an array, but item_ids is None?

elanmart avatar Mar 08 '18 23:03 elanmart

Hi @maciejkula and @elanmart , would like to know if there is an update on this issue.

I am trying the same thing and got the same error.

E.g., I trained a factorization model by using the following codes:

from spotlight.cross_validation import random_train_test_split
from spotlight.datasets.movielens import get_movielens_dataset
from spotlight.evaluation import rmse_score
from spotlight.factorization.explicit import ExplicitFactorizationModel

dataset = get_movielens_dataset(variant='1M')

train, test = user_based_train_test_split(dataset, test_percentage=0.25)

model = ExplicitFactorizationModel(n_iter=1)
model.fit(train)

Then when I predict recommendation scores by using the test data with

model.predict(test.user_ids)

I got error of

RuntimeError: The expanded size of the tensor (3707) must match the existing size (258400) at non-singleton dimension 0. at /opt/conda/conda-bld/pytorch_1518241081361/work/torch/lib/TH/generic/THTensor.c:309

So I used one user id instead (e.g., the first one in the array)

model.predict(test.user_ids[0])

It worked and returned me with an array of item recommendation scores.

Another question is, if I use one user id for prediction, what are the item IDs corresponding to the scores?

Any advice will be very much appreciated! :)

Best, Le

yueguoguo avatar Mar 26 '18 05:03 yueguoguo

I think I get some useful information from #30

yueguoguo avatar Mar 26 '18 08:03 yueguoguo

@yueguoguo @elanmart to document current implementation better, how does the following sound? https://github.com/maciejkula/spotlight/pull/109

maciejkula avatar Apr 27 '18 09:04 maciejkula

Much better! Thanks @maciejkula

yueguoguo avatar Apr 27 '18 09:04 yueguoguo