agtboost
agtboost copied to clipboard
Add paralellisation with `OpenMP`
The node::split_information()
should be easy to paralellize.
Indeed, and up to 4-8 threads/CPU cores it can have a very good benefit, though based on experience with xgboost/lightgbm the scaling beyond 8 cores is difficult/with very much diminishing returns for dataset sizes commonly found in practice (100K-1M records):
(the panels are for various dataset sizes, 0.1M (million) rows, 1M and 10M)
Also there is an actual slow down on systems with multi-CPU sockets (even for super-large datasets) for example xgboost and lightgbm are not "NUMA optimized":
More details in this repo https://github.com/szilard/GBM-perf#multi-socket-cpus or in this talk https://www.youtube.com/watch?v=qjuizRba3ZQ&t=31m00s