MAPIE icon indicating copy to clipboard operation
MAPIE copied to clipboard

Automated estimation of number resamplings given the size of the train data

Open ivan-marroquin opened this issue 3 years ago • 2 comments

Is your feature request related to a problem? Please describe. When defining the "cv" splitter using the Subsample class, it is required to provide the "n_resamplings" and "n_samples". If the "n_resamplings" is not properly selected, the following warning message is raised:

"WARNING: at least one point of training set belongs to every resamplings. Increase the number of resamplings"

Describe the solution you'd like I think it will be beneficial if there is an automated way to estimate "n_resamplings" given the "n_samples". For instance, a user would choose to fix the "n_samples" in the following manner: n_samples= int(0.25 * gral_train_inputs.shape[0])

Then, the "n_resamplings" is determined accordingly to the size of the training data.

Describe alternatives you've considered In my case, I decided to fix the "n_samples" as shown above. But now, I have to do trail/error to find the minimum "n_resamplings" to avoid the warning message to ensure good statistical results.

Kind regards, Ivan

ivan-marroquin avatar Nov 02 '22 13:11 ivan-marroquin

Hi @ivan-marroquin, thank you for your issue. Indeed, automatically estimated the number of resampling is a good idea. Do you have a theoretical formula to do it ? If so feel free to share it with us, or even implement it into MAPIE a do a Pull Request

vincentblot28 avatar Mar 02 '23 09:03 vincentblot28

Hello everyone ! I tried to reproduce the bug without success. How am I supposed to get the warning message raised ? @vincentblot28

BaptisteCalot avatar Jun 19 '24 12:06 BaptisteCalot