xgboost Use quantised gradients in gpu

Use quantised gradients in gpu_hist histograms

Open RAMitchell opened this issue 2 years ago • 7 comments

Floating point addition and subtraction is causing small numerical errors in the learning process, sometimes causing the algorithm to create splits on 'phantom' gradients. We can solve this by consistently working with gradients in a high precision quantised format, only converting back to floating point for the final gain or weight calculations.

This PR is the first of 2 PRs, converting histogram related calculations to use quantised gradients.

A second PR will address the split evaluation part.

Sep 14 '22 15:09 RAMitchell

Benchmarks look good:

dataset	master	integer
airline	58.75668815	59.16649409
bosch	12.61275904	10.81264534
covtype	14.17920381	14.0353592
epsilon	37.92378811	34.50982775
fraud	1.07923842	1.037245154
higgs	10.65482221	10.36926403
newsgroups	179.9059501	182.5651175
year	4.084809801	3.969981108

Sep 19 '22 13:09 RAMitchell

Thank you for working on resolving this issue. What's the current status of the floating point error?

Sep 20 '22 13:09 trivialfis

This does not fix anything yet, I need to do this second part which affects the calculations in split evaluation.

This spark test failure looks unrelated to this PR.

Sep 20 '22 14:09 RAMitchell

I restarted the CI.

Sep 21 '22 10:09 trivialfis

The pyspark error seems to be related.

Sep 21 '22 13:09 trivialfis

I think I need to reduce over the distributed workers in the histogram rounding bound calculation.

Sep 21 '22 14:09 RAMitchell

I think I need to reduce over the distributed workers in the histogram rounding bound calculation.

hmm .. why?

Sep 21 '22 14:09 trivialfis

Allreduce is now happening on quantised gradients. We would get the wrong results if each worker provided integer gradients quantised in a different scale.

Sep 23 '22 08:09 RAMitchell

@trivialfis can I please get another review?

Sep 23 '22 15:09 RAMitchell

xgboost xgboost copied to clipboard

Use quantised gradients in gpu_hist histograms

xgboost
xgboost copied to clipboard