LibRecommender icon indicating copy to clipboard operation
LibRecommender copied to clipboard

Detect unknown interaction(s) when using metrics like roc_auc/map on ALS and BPR algorithms.

Open faizanmuheeb opened this issue 1 year ago • 2 comments

Hello,

I tried simulating a 5 fold cross-validation procedure on ALS and BPR algorithms. However, when I use metrics like roc_auc, map etc.. I see warnings "Detect 1 unknown interaction(s), position: [2592]". This does not happen when I use random_split or other built-in splitting methods. Is there any way to avoid these warnings or is it safe to ignore them?

issue-2 issue-1

faizanmuheeb avatar Dec 17 '24 18:12 faizanmuheeb

This warning is because your test data contains users or items not existing in the training data. They are "unknown" to the trained model. If your data is large, they can be ignored, or you can modify the _filter_unknown_user_item function in the library to filter out unknown users and items.

Here is the explanation of the function from Copilot:

The _filter_unknown_user_item function filters out rows from the test datasets that contain users or items not present in the training dataset.

Input: A list of datasets, with the first dataset considered as the training data. Output: A list of datasets with unknown users and items removed from the test datasets. Steps:

  1. Extract unique users and items from the training data.
  2. Iterate over the test datasets.
  3. Identify indices of rows with users or items not present in the training data.
  4. Remove those rows from the test datasets.
  5. Return the cleaned list of datasets.

massquantity avatar Dec 19 '24 11:12 massquantity

These warnings only occur on some metrics like map, roc_auc, pr_auc. Others like precision, recall and ndcg do not give out those warnings. Thanks for the explanation, I'll try that out.

faizanmuheeb avatar Dec 19 '24 16:12 faizanmuheeb