lightfm
lightfm copied to clipboard
Call predict() for every user, every item
Hello. I have been trying to use the LightFM predict function for every user and every item in the system. Essentially, my problems occur with supplying the input to predict, as I am unsure what kind of format user_ids and item_ids should take. My code is complex, so I'm going to supply simple examples.
Function definition:
predict(user_ids, item_ids, item_features=None, user_features=None, num_threads=1)
Suppose I want to get predictions for 10 users with ids [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and 10 items with ids [0, 1, 2, 3, 4, 5, 6, 7, 8, 9].
How would I use predict() to get a prediction for every user-item interaction? The following attempts do not work (and numerous others I have tried):
user_ids = np.arrange(10)
item_ids = np.arrange(10)
predict(user_ids, item_ids)
user_ids = np.arrange(10)
item_ids = [ np.arrange(10) for i in range(10) ]
predict(user_ids, item_ids)
I am able to use predict inside of a for loop, such as
user_ids = np.arrange(10)
item_ids = np.arrange(10)
for i in user_ids:
predict(i, item_ids)
However, I would prefer to avoid this, as I can only imagine that calculating everything at once would be more efficient.
Once I figure this out, I will work on incorporating the features.
Thank you for your help!
Hello, you can try this approach Let we have 10 users with ids [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] and 10 items with ids [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
n_users = 10 n_items = 10
user_ids = np.concatenate([np.full((n_items, ), i) for i in range(0, n_users)]) output: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9])
item_ids = np.concatenate([np.arange(n_items) for i in range(n_users)]) output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
Then scores = model.predict(user_ids, item_ids)
np.argsort(-scores[:n_items]) --> this will be prediction for user with id 0
Let me know if it is working or not.
Thanks
@sam-ai, thank you for your help. I got it working with your solution.
Another question concerning the predict method. Should the user_features and item_features be passed for the entire system, or just for the user_ids and item_ids which are supplied to that predict call?
In other words, which of these would be correct?: `
Use all user features and item_features.
scores = predict( user_ids = user_ids, item_ids = item_ids, item_features = get_item_features("ALL"), user_features = get_user_features("ALL") )
Use the user and item features associated with the user_ids and item_ids.
scores = predict( user_ids = user_ids, item_ids = item_ids, item_features = get_item_features(item_ids), user_features = get_user_features(user_ids) )`
It seems very inefficient to have to repeatedly pass in every single feature every time you have to call predict.
Cool, now i am working on that part. Once i know regarding that part i will let you know. If you get some hints comment below.
I would like to know it too: should I pass the entire user_features when predicting new data, or just the correspondence with the user_id?
I do not claim to be an expert on this topic, but I will share what I have implemented which seems to work pretty well. I would recommend passing in the entire features matrix. If you build this using the Dataset class, this matrix should be a CSR matrix, so it is not going to take up a lot of memory. Just keep it stored in a variable the entire time. It can also be pickled and un-pickled easily. Pass it in and out as needed.