fairgbm
fairgbm copied to clipboard
Improve FairGBM multi-threading
According to our perf and valgrind benchmarks, a large percentage of CPU time is spent on synchronization of separate threads during training.
The net outcome of multi-threading is still positive, however when using OMP_NUM_THREADS=4 our code will only consistently use 2 threads, seeming unable to fully parallelize.