Voting Parallel Learner
LightGBM implements a voting-parallel tree learner to reduce the communication overhead between nodes for datasets with a large number of features. Currently, I'm working on a project that requests on the order of 2000 features, and we've found that, even with NCCL, the communication is a major component of the fitting time, especially when one scales to more than one machine of 8 GPUs. Is there any plan to support the two-round voting system proposed in the paper?
Currently, XGBoost supports the data-parallel and feature-parallel learning through the data_split_mode in DMatrix.
Any pointers to the code or a rough implementation plan would also be appreciated, as I'm not familiar with this codebase.
LightGBM-a-communication-efficient-parallel-algorithm-for-decision-tree.pdf
Thank you for opening the issue. Yeah, we thought about supporting it, but so far we have been focusing on scaling with the number of samples instead of features.
Any pointers to the code or a rough implementation plan would also be appreciated, as I'm not familiar with this codebase.
I'm not familiar with the details of the algorithm yet. Will look into it and see how difficult it's.