models icon indicating copy to clipboard operation
models copied to clipboard

[FEA] Synthetic data generation without overlap of user-item interactions

Open oliverholworthy opened this issue 3 years ago • 0 comments

🚀 Feature request

Motivation

When building latent factor models (for example with LightFM). Evaluation requires that the validation or test dataset has no user-item interaction overlap. LightFM raises an exception if this happens.

To simplify writing tests with these models and synthetic data. We can add some constraints on the dataset split logic to avoid any user-item interaction overlap. Or unseen item features or users in the test or validation splits.

Proposed constraints:

  • no item features in the test/vaild datasets that were not present in the training
  • no overlap between user-item interactions between splits
  • no users in the test/valid datasets that were not present in the training data

Example from tests added in #629

oliverholworthy avatar Aug 17 '22 09:08 oliverholworthy