cornac icon indicating copy to clipboard operation
cornac copied to clipboard

[ASK] For BiVAECF, the interactive items in trainset are not removed during the evaluation phase.

Open georgeguo-cn opened this issue 2 years ago • 6 comments

Description

I check the recommender.rank() and bivaecf.score() and find that the interactive items in trainset are not removed during the evaluation phase.

When recommending top-n items with highest score from whole item space, whether it's necessary to remove the interactive item in trainset or give these imteractive item a smallest value (or zero)?

I give these interactive items zero value by adding follow codes in bivaecf.score(),

if item_idx is None:
    ...
    train_mat = self.train_set.csr_matrix
    csr_row = train_mat.getrow(user_idx)
    pos_items = [item_idx for (item_idx, rating) in zip(csr_row.indices, csr_row.data) if rating >= 4.0]
    known_item_scores[pos_items] = 0.
    return known_item_scores

From the results on ml-100k, there was a significant improvement for each metrics,

Model NDCG@50 Precision@50 Recall@50
BiVAECF 0.2378 0.0739 0.4288
BiVAECF with setting zero value 0.3392 0.1048 0.5233

Other Comments

georgeguo-cn avatar Jul 22 '22 08:07 georgeguo-cn

Hi, thanks for raising this question. Ideally items that a given user has interacted with should be ignored (filtered out from the ranking list) if we wish not recommend them. In some applications it is fine to recommend previously consumed items.

Coming back to the evaluation. On the rating evaluation filtering out such items in the predict scores list has no impact. The evaluations are on the test items only. Please refer to rating_eval() in base_method.py.

On the raking evaluation (rank_eval() in base_method.py) however it seems that consumed items are not filtered out from the respective user ranking lists, which would penalise the raking metrics since such items are out of the test set. We will consider adding an option to filter out such items for every user in rank_eval(). It should be done at this level as it is not model specific.

@tqtg could you please double check this aspect?

saghiles avatar Jul 22 '22 14:07 saghiles

  1. @georgeguo-cn what you did is trying to modify the BiVAE model output. As @saghiles already explained, in some scenarios, we might not want to remove recommendations that user repeatedly purchases (e.g., groceries).
  2. What we did in the evaluation is trying to make sure that true positive items appearing in the training and validation sets will not be considered as negative items. In other words, these true positive items are filtered out during the evaluation on test set. (See the code)

tqtg avatar Jul 22 '22 16:07 tqtg

Thanks for your reply.

@saghiles @tqtg I agree that, in some scenarios, we might recommend user repeatedly purchases. Therefore, it's necessary to set an option to choose whether filter out such items for every user in rank_eval()(not in a model bivaecf.score()). In particular, the item_indices in rank_eval() (code)) can be modified to,

# 'consider_repeat' is a option, Ture/False
item_indices = None if consider_repeat else np.arange([x for x in list(range(test_set.num_items)) if x not in (val_pos_items + train_pos_items)])
  1. @saghiles for rating prediction rating_eval(), it only need to predict the score of candidate items in testset for each user and don't consider the interacted items in trainset. However, for ranking prediction rank_eval() from whole item space, this option needs to be considered.
  2. @tqtg I double checked the code of rating_eval() and mt.compute for each metric. Although true positive items appearing in the training and validation sets were screened (code), they were not used in the calculation of each metric (except for AUC).

Hence, I hope you consider adding an option to filter out such positive items appearing in the training and validation sets for every user in rank_eval()

georgeguo-cn avatar Jul 23 '22 11:07 georgeguo-cn

In addition, during data processing, items with repeated interactions are only considered once, so most of the processed datasets do not have items with repeated interactions.

georgeguo-cn avatar Jul 23 '22 12:07 georgeguo-cn

Hi @georgeguo-cn, rank_eval() already makes predictions for held out test items only:

    (u_indices, i_indices, r_values) = test_set.uir_tuple
    r_preds = np.fromiter(
        tqdm(
            (
                model.rate(user_idx, item_idx).item()
                for user_idx, item_idx in zip(u_indices, i_indices)
            ),
            desc="Rating",
            disable=not verbose,
            miniters=100,
            total=len(u_indices),
        ),
        dtype=np.float,
    )

My point was that for the rating evaluation case even if we score all items it will not have any impact on the results, since rating metric computation consider only test items.

Regarding rank_eval() you are right, currently for most metrics train and validation items are not filtered out from the ranking list. We will add an option to filter out such items.

Thank you!

saghiles avatar Jul 23 '22 13:07 saghiles

Ok, thank you. That's great.

georgeguo-cn avatar Jul 23 '22 13:07 georgeguo-cn