agtboost icon indicating copy to clipboard operation
agtboost copied to clipboard

Add paralellisation with `OpenMP`

Open Blunde1 opened this issue 4 years ago • 2 comments

The node::split_information()should be easy to paralellize.

Blunde1 avatar May 24 '20 14:05 Blunde1

Indeed, and up to 4-8 threads/CPU cores it can have a very good benefit, though based on experience with xgboost/lightgbm the scaling beyond 8 cores is difficult/with very much diminishing returns for dataset sizes commonly found in practice (100K-1M records):

Screen Shot 2020-09-04 at 2 37 31 AM

(the panels are for various dataset sizes, 0.1M (million) rows, 1M and 10M)

szilard avatar Sep 04 '20 09:09 szilard

Also there is an actual slow down on systems with multi-CPU sockets (even for super-large datasets) for example xgboost and lightgbm are not "NUMA optimized":

Screen Shot 2020-09-04 at 2 41 02 AM Screen Shot 2020-09-04 at 2 41 25 AM

More details in this repo https://github.com/szilard/GBM-perf#multi-socket-cpus or in this talk https://www.youtube.com/watch?v=qjuizRba3ZQ&t=31m00s

szilard avatar Sep 04 '20 09:09 szilard