ParBayesianOptimization icon indicating copy to clipboard operation
ParBayesianOptimization copied to clipboard

Running lightgbm with doParallel fails

Open partyom opened this issue 5 years ago • 2 comments

Running bayesOpt package with parallel option set to true on R lightgbm package on Windows / R version 4.0 results in Error in unserialize(socklist[[n]]) : error reading from connection.

It seems to be the problem with lightgbm package as foreach(i = 1:2, .packages="lightgbm") %dopar%crashes with the same error, whereasforeach(i = 1:2, .packages="lightgbm") %do% ` finishes without a problem. Related problem has also been previously reported for this package.

As a feature idea, it might be worth considering a possibility of keeping the lightgbm (or any other model) calculation in the main thread as most of them come with internal parallelization logic, while fitting Gaussian process & optimization on multiple cores.

Best regards, Artyom

  1. https://github.com/microsoft/LightGBM/issues/1238
  2. https://lightgbm.readthedocs.io/en/latest/FAQ.html#lightgbm-hangs-when-multithreading-openmp-and-using-forking-in-linux-at-the-same-time

partyom avatar Nov 06 '20 01:11 partyom

We definitely want to keep the ability to run the scoring function in parallel - not all users of the package are tuning hyperparameters that allow multithreading. One option would be to allow parallel processing separately for the scoring function and the GP optimization.

I think the best solution would be to automatically set up parallelization if iters.k > 1, and have a second function parameter which allows the GP optimization to be run in parallel. Need to think through it though.

AnotherSamWilson avatar Nov 09 '20 19:11 AnotherSamWilson

My datasets are relatively small hence lightgbm internal parallelization doesn't bring any substantial gain, and most time is spent in BayesOpt steps. Splitting the parallel processing can be indeed a way out.

partyom avatar Nov 18 '20 10:11 partyom