lightfm icon indicating copy to clipboard operation
lightfm copied to clipboard

Call predict() for every user, every item

Open lkurlandski opened this issue 3 years ago • 5 comments

Hello. I have been trying to use the LightFM predict function for every user and every item in the system. Essentially, my problems occur with supplying the input to predict, as I am unsure what kind of format user_ids and item_ids should take. My code is complex, so I'm going to supply simple examples.

Function definition: predict(user_ids, item_ids, item_features=None, user_features=None, num_threads=1)

Suppose I want to get predictions for 10 users with ids [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and 10 items with ids [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].

How would I use predict() to get a prediction for every user-item interaction? The following attempts do not work (and numerous others I have tried):

user_ids = np.arrange(10)
item_ids = np.arrange(10)
predict(user_ids, item_ids)
user_ids = np.arrange(10)
item_ids = [ np.arrange(10) for i in range(10) ]
predict(user_ids, item_ids)

I am able to use predict inside of a for loop, such as

user_ids = np.arrange(10) 
item_ids = np.arrange(10)
for i in user_ids:
     predict(i, item_ids)

However, I would prefer to avoid this, as I can only imagine that calculating everything at once would be more efficient.

Once I figure this out, I will work on incorporating the features.

Thank you for your help!

lkurlandski avatar Oct 26 '20 13:10 lkurlandski

Hello, you can try this approach Let we have 10 users with ids [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and 10 items with ids [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

n_users = 10 n_items = 10

user_ids = np.concatenate([np.full((n_items, ), i) for i in range(0, n_users)]) output: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])

item_ids = np.concatenate([np.arange(n_items) for i in range(n_users)]) output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

Then scores = model.predict(user_ids, item_ids)

np.argsort(-scores[:n_items]) --> this will be prediction for user with id 0

Let me know if it is working or not.

Thanks

sam-ai avatar Oct 27 '20 10:10 sam-ai

@sam-ai, thank you for your help. I got it working with your solution.

Another question concerning the predict method. Should the user_features and item_features be passed for the entire system, or just for the user_ids and item_ids which are supplied to that predict call?

In other words, which of these would be correct?: `

Use all user features and item_features.

scores = predict( user_ids = user_ids, item_ids = item_ids, item_features = get_item_features("ALL"), user_features = get_user_features("ALL") )

Use the user and item features associated with the user_ids and item_ids.

scores = predict( user_ids = user_ids, item_ids = item_ids, item_features = get_item_features(item_ids), user_features = get_user_features(user_ids) )`

It seems very inefficient to have to repeatedly pass in every single feature every time you have to call predict.

lkurlandski avatar Oct 27 '20 13:10 lkurlandski

Cool, now i am working on that part. Once i know regarding that part i will let you know. If you get some hints comment below.

sam-ai avatar Oct 28 '20 08:10 sam-ai

I would like to know it too: should I pass the entire user_features when predicting new data, or just the correspondence with the user_id?

igorkf avatar Nov 16 '20 21:11 igorkf

I do not claim to be an expert on this topic, but I will share what I have implemented which seems to work pretty well. I would recommend passing in the entire features matrix. If you build this using the Dataset class, this matrix should be a CSR matrix, so it is not going to take up a lot of memory. Just keep it stored in a variable the entire time. It can also be pickled and un-pickled easily. Pass it in and out as needed.

lkurlandski avatar Nov 16 '20 21:11 lkurlandski