models icon indicating copy to clipboard operation
models copied to clipboard

[FEA] Split datasets in `datasets` package chronologically

Open karlhigley opened this issue 3 years ago • 2 comments

🚀 Feature request

Split the datasets in datasets package chronologically instead of at random.

Motivation

Splitting at random is known to be a problematic splitting procedure, since it leaks future data into the past. A preferred approach is to split the dataset chronologically, so that the first 80% of the data is the train set and the last 20% is the test set.

karlhigley avatar May 20 '22 14:05 karlhigley

@karlhigley Yes, I think we can do that for the datasets with timestamp.

rnyak avatar May 31 '22 14:05 rnyak

@bschifferer @radekosmulski This would be a good example. How to prepare data for RecSys.

EvenOldridge avatar Jun 14 '22 16:06 EvenOldridge