models
models copied to clipboard
[FEA] Split datasets in `datasets` package chronologically
🚀 Feature request
Split the datasets in datasets package chronologically instead of at random.
Motivation
Splitting at random is known to be a problematic splitting procedure, since it leaks future data into the past. A preferred approach is to split the dataset chronologically, so that the first 80% of the data is the train set and the last 20% is the test set.
@karlhigley Yes, I think we can do that for the datasets with timestamp.
@bschifferer @radekosmulski This would be a good example. How to prepare data for RecSys.