RecBole
RecBole copied to clipboard
[🐛BUG] Wrong (maybe) calculation of precision topk metric in sequential models.
Describe the bug I noticed that my model (SASRec) has very good recall@10 metric, but poor precision@10, recall@10 is 70% and precision@10 is 7%. I could not believe that because when evaluating by hand, my model performed very well with precision metric also.
Then I tried to figured out how exactly those metrics are calculated and as I can judge for now, there may be a bug.
I will try to explain. The first thing i noticed here:
# recbole/evaluator/collector.py
if self.register.need("rec.topk"):
_, topk_idx = torch.topk(
scores_tensor, max(self.topk), dim=-1
) # n_users x k
pos_matrix = torch.zeros_like(scores_tensor, dtype=torch.int)
pos_matrix[positive_u, positive_i] = 1
pos_len_list = pos_matrix.sum(dim=1, keepdim=True)
pos_idx = torch.gather(pos_matrix, dim=1, index=topk_idx)
result = torch.cat((pos_idx, pos_len_list), dim=1)
self.data_struct.update_tensor("rec.topk", result)
here i can see that the shape of result
is (batch_size, maxTopK). But in case of sequential dataloader
this result will always has at most only one positive item in a row because positive_u is just an torch.arange
of batch_size:
# recbole/data/dataloader/general_dataloader.py
interaction = self._dataset[index]
transformed_interaction = self.transform(self._dataset, interaction)
inter_num = len(transformed_interaction)
positive_u = torch.arange(inter_num) #this is how positive_u calculated for sequential dataloader
positive_i = transformed_interaction[self.iid_field]
All in all this leads us to poor precision results because precision then is calculated as sum of true_positives, that will always be at most 1, but in top10 calculation it must be at most 10.
I think for seqeuntial recomendations it will be better to predict topK items by appending K mask tokens to the end of sequence and evaluating it on K last interactions, but now it is actually evaluated on one item at a time (mask one item, evaluate on 1 positive). When we evaluate only on one positive, we can never has precision@>1 be 100% simply because if we predict 2 items, but user has only one positive, precision@2 will be at most 50%.
Thank you for your work and sorry if i missunderstood something in your code and made wrong assumptions.
@andvikt Thank you for your suggestions ! Since Sequential Recommendation models in RecBole are formatted to predict the next item(only one) in the sequence, the calculation of our precision@k metrics may not perform well under this circumstance. We will discuss about the optimization method in following updates.