spotlight
spotlight copied to clipboard
Getting multiple predictions seems broken
I'm playing with implicit models using default BilinearNet
as representation.
Given interactions test
and some model model
, one would expect
model.predict(test.user_ids)
will work, but it raises
RuntimeError: The expanded size of the tensor (<num_users>) must match the existing size (<...>) at non-singleton dimension 0
I think fixing this would require changing the way spotlight
generates predictions. Currently when a we want predictions for user 7
and items [1, 2, 3]
, we actually call
model._net([7, 7, 7], [1, 2, 3])
To scale this to multiple users, e.g. users [7, 8]
we could
- Generate a tensor
[[7,7,7], [8,8,8]]
foruser_ids
, and call_net()
as usual (someunsqueeze
onitem_embeddings
would be needed for broadcasting) - Have a
BilinearNet.predict_all
method that would compute
x = th.LongTensor([7, 8])
y = th.LongTensor([1, 2, 3])
self.user_embeddings(x) @ self.item_embeddings(y).t()
- Use
torch.bmm()
which, depending on the shape of theuser_ids
either computes equivalent1.
or2.
I believe 2. is the cleanest and should be also the fastest. @maciejkula what do you think?
I think this is basically a feature request, something that the library doesn't do at the moment (rather than a bug).
I'd be happy to have a think about doing this provided that
- You'd be happy to apply it consistently to all models.
- The result doesn't complicate the code much.
One pointer for opening issues: it may be nicer to the maintainer if you don't start by assuming something is broken when it doesn't behave exactly as you want.
I'm sorry, I didn't intend this to sound offensive. It seemed broken, since doing something quite intuitive resulted in an obscure pytorch error, but I agree I'm at fault here.
I can try to add this feature in the near future, but for now perhaps it would be nice to add a helpful error
message when user_ids
is an array, but item_ids
is None
?
Hi @maciejkula and @elanmart , would like to know if there is an update on this issue.
I am trying the same thing and got the same error.
E.g., I trained a factorization model by using the following codes:
from spotlight.cross_validation import random_train_test_split
from spotlight.datasets.movielens import get_movielens_dataset
from spotlight.evaluation import rmse_score
from spotlight.factorization.explicit import ExplicitFactorizationModel
dataset = get_movielens_dataset(variant='1M')
train, test = user_based_train_test_split(dataset, test_percentage=0.25)
model = ExplicitFactorizationModel(n_iter=1)
model.fit(train)
Then when I predict recommendation scores by using the test data with
model.predict(test.user_ids)
I got error of
RuntimeError: The expanded size of the tensor (3707) must match the existing size (258400) at non-singleton dimension 0. at /opt/conda/conda-bld/pytorch_1518241081361/work/torch/lib/TH/generic/THTensor.c:309
So I used one user id instead (e.g., the first one in the array)
model.predict(test.user_ids[0])
It worked and returned me with an array of item recommendation scores.
Another question is, if I use one user id for prediction, what are the item IDs corresponding to the scores?
Any advice will be very much appreciated! :)
Best, Le
I think I get some useful information from #30
@yueguoguo @elanmart to document current implementation better, how does the following sound? https://github.com/maciejkula/spotlight/pull/109
Much better! Thanks @maciejkula