models
models copied to clipboard
[FEA] Synthetic data generation without overlap of user-item interactions
🚀 Feature request
Motivation
When building latent factor models (for example with LightFM). Evaluation requires that the validation or test dataset has no user-item interaction overlap. LightFM raises an exception if this happens.
To simplify writing tests with these models and synthetic data. We can add some constraints on the dataset split logic to avoid any user-item interaction overlap. Or unseen item features or users in the test or validation splits.
Proposed constraints:
- no item features in the test/vaild datasets that were not present in the training
- no overlap between user-item interactions between splits
- no users in the test/valid datasets that were not present in the training data
Example from tests added in #629