torch-ppr icon indicating copy to clipboard operation
torch-ppr copied to clipboard

PPR for Bipartite graph in recommendation

Open riyaj8888 opened this issue 11 months ago • 3 comments

can u provide one simple example , how to use it for bipartite graph for recommending items to for users using this PPR code.

thanks

riyaj8888 avatar Aug 24 '23 05:08 riyaj8888

Hi @riyaj8888 ,

I am not very familiar with the latest research on recommender systems, so you might want to take a look at that first. One particular question would be whether you need to pre-process your data to account for the sparsity of user-item interactions (e.g., cluster users or items into groups, and then operate on a densified user-cluster-item-cluster interaction matrix).

That said, this library only gives you a (fast) way to compute the (personalized) page rank, which can be interpreted as a relevance score. A quick application I can think of is to calculate PPR scores for a given user, and then recommend items that have a high score. To do this, you need to

  1. construct the adjacency matrix; there are several possible formats, but the simplest might be an edge index, see https://torch-ppr.readthedocs.io/en/stable/usage.html#torch_ppr.api.personalized_page_rank
import torch
edge_index = torch.as_tensor(data=[
  (user_id, item_id + num_users)
  for user_id, item_id in user_item_interactions
]).t()

Note that you have to make sure that the node IDs are unique, i.e., a user with id i does not get the same node id as the item i.

  1. calculate the personalized page rank scores for a user of interest
from torch_ppr import personalized_page_rank
scores = personalized_page_rank(edge_index=edge_index, indices=[user_id_of_interest])

Note that this will be a vector of size num_users + num_items, i.e., you also get scores for other users and not just items.

  1. e.g., look at the top-k items
best_item_ids = scores[num_users:].topk(k=10, sorted=True)

mberr avatar Aug 24 '23 19:08 mberr

Thanks, in step 3. it should be scores[num_items:] ri8?

riyaj8888 avatar Aug 25 '23 04:08 riyaj8888

in step 3. it should be scores[num_items:] ri8?

No. With the code above, you use the node ids [0, ..., num_users) for user nodes, and [num_users, ..., num_users + num_items) for item nodes. Thus, we look at the scores for nodes starting at num_users until the end.

You could also switch that, but then you would need to adjust step 2, too, since indices has to be a node index, i.e., you need to translate the user index to a node index first.

mberr avatar Aug 25 '23 15:08 mberr