Detect unknown interaction(s) when using metrics like roc_auc/map on ALS and BPR algorithms.
Hello,
I tried simulating a 5 fold cross-validation procedure on ALS and BPR algorithms. However, when I use metrics like roc_auc, map etc.. I see warnings "Detect 1 unknown interaction(s), position: [2592]". This does not happen when I use random_split or other built-in splitting methods. Is there any way to avoid these warnings or is it safe to ignore them?
This warning is because your test data contains users or items not existing in the training data. They are "unknown" to the trained model. If your data is large, they can be ignored, or you can modify the _filter_unknown_user_item function in the library to filter out unknown users and items.
Here is the explanation of the function from Copilot:
The _filter_unknown_user_item function filters out rows from the test datasets that contain users or items not present in the training dataset.
Input: A list of datasets, with the first dataset considered as the training data. Output: A list of datasets with unknown users and items removed from the test datasets. Steps:
- Extract unique users and items from the training data.
- Iterate over the test datasets.
- Identify indices of rows with users or items not present in the training data.
- Remove those rows from the test datasets.
- Return the cleaned list of datasets.
These warnings only occur on some metrics like map, roc_auc, pr_auc. Others like precision, recall and ndcg do not give out those warnings. Thanks for the explanation, I'll try that out.