xgboost icon indicating copy to clipboard operation
xgboost copied to clipboard

Optimization of data initialization for large sparce datasets

Open razdoburdin opened this issue 6 months ago • 6 comments

This PR speed-ups data initialization for large sparce datasets being executed on multi-core CPUs by parallelizing the execution. For bosch dataset this PR improve fitting time on 1.3x for 2x56cores system.

To avoid the race condition, I have also switched from using bitfields as missing flag to uint8_t.

razdoburdin avatar Apr 07 '25 12:04 razdoburdin