verstack
verstack copied to clipboard
Cross validation support during HPO
I used the hyperparameter optimisation and found it really useful- Thanks. I was hoping to carry the same process out but with cross-validation (specifically groupKfold). support for scikit cv would be very useful.
I Geethen. I presume you are referring to the LGBMTuner class within verstack, right?
How big is the dataset you using for training and what is the task type (regression/binary/multiclass)?
Yes, thats correct. LGBMTuner within verstack
dataset size: 1.3Gb when stored as a feather with shape roughly 450 000 rows and 707 columns. Task type: regression
For regression tasks every trial of hyperparameters optimisation within LGBMTuner is carried out on a new random split. This way this is very similar to a cross-validation approach you are seeking. If you are using LGBMTuner with default parameters (200 trials) - that means you will have 200 random train/valid splits during the tuning process. Moreover your dataset is big enough not to worry about additional validation.
I've been planning to add a holdout-validation option into LGBMTuner for quite some time now, specifically for time series applications. I guess this will be the motivation. I will let you know when it is available.
I'm working with spatial data, so I use groupkfold to limit spatial autocorrelation. Hence my request. Thanks for looking more into this. I think having a tuner that can take any CV option as an argument will be the most flexible.
If I remember correctly, optuna did allow for a sklearn CV strategy to be specified. I could share an example, if that's helpful?
On Thu, 13 Oct 2022, 3:59 pm Danil Zherebtsov, @.***> wrote:
I've been planning to add a holdout-validation option into LGBMTuner for quite some time now, specifically for time series applications. I guess this will be the motivation. I will let you know when it is available.
— Reply to this email directly, view it on GitHub https://github.com/DanilZherebtsov/verstack/issues/18#issuecomment-1277660631, or unsubscribe https://github.com/notifications/unsubscribe-auth/AJBW5I2LFXWDYMBQ2I5OGELWDAIV3ANCNFSM6AAAAAAQ3VME3Q . You are receiving this because you authored the thread.Message ID: @.***>
If I remember correctly, optuna did allow for a sklearn CV strategy to be specified. I could share an example, if that's helpful?
If you have something handy - please share, it will be a good starting point for me