Automated estimation of number resamplings given the size of the train data
Is your feature request related to a problem? Please describe. When defining the "cv" splitter using the Subsample class, it is required to provide the "n_resamplings" and "n_samples". If the "n_resamplings" is not properly selected, the following warning message is raised:
"WARNING: at least one point of training set belongs to every resamplings. Increase the number of resamplings"
Describe the solution you'd like I think it will be beneficial if there is an automated way to estimate "n_resamplings" given the "n_samples". For instance, a user would choose to fix the "n_samples" in the following manner: n_samples= int(0.25 * gral_train_inputs.shape[0])
Then, the "n_resamplings" is determined accordingly to the size of the training data.
Describe alternatives you've considered In my case, I decided to fix the "n_samples" as shown above. But now, I have to do trail/error to find the minimum "n_resamplings" to avoid the warning message to ensure good statistical results.
Kind regards, Ivan
Hi @ivan-marroquin, thank you for your issue. Indeed, automatically estimated the number of resampling is a good idea. Do you have a theoretical formula to do it ? If so feel free to share it with us, or even implement it into MAPIE a do a Pull Request
Hello everyone ! I tried to reproduce the bug without success. How am I supposed to get the warning message raised ? @vincentblot28