parfit icon indicating copy to clipboard operation
parfit copied to clipboard

Using Parfit with Custom CV split

Open thatlittleboy opened this issue 4 years ago • 0 comments

Currently, cross validation in parfit can be performed by specifying n_folds. Is there a possibility for providing a functionality for the user to specify the CV splits manually via index? Or even better, passing in general Sklearn splitter objects?

Thanks!

Motivation

One possible use-case is when trying to do CV for a time-series dataset, where the usual CV split is not suitable because of the causality inherent in the data. The general consensus, then, seems to be to do the CV split like:

assume we have data in 5 blocks: [1,2,3,4,5]
split 1: train: [1], val: [2]
split 2: train: [1,2], val: [3]
split 3: train: [1,2,3], val: [4]
split 4: train: [1,2,3,4], val: [5]

This is implemented in Sklearn as TimeSeriesSplit: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html

thatlittleboy avatar Sep 06 '19 08:09 thatlittleboy