skll icon indicating copy to clipboard operation
skll copied to clipboard

Consider parallelizing xval

Open mulhod opened this issue 6 years ago • 0 comments

Cross-validation runs serially (grid search cross-validation, however, does make use of threads). This is a considerable bottleneck for large data-sets/large feature spaces. For example, in recent experiments with 15k samples and perhaps up to 100k features, 10-fold cross-validation can take upwards of two weeks. It would be a good idea to consider parallelizing at the cross-validation fold-level, if possible. For example, perhaps each fold can be gridmaped individually or folds can be run in threads (however, as mentioned, grid search cross-validation already spawns 3 threads, so that would have to be kept in mind).

mulhod avatar Sep 25 '19 15:09 mulhod