xgboost icon indicating copy to clipboard operation
xgboost copied to clipboard

Use quantised gradients in gpu_hist histograms

Open RAMitchell opened this issue 2 years ago • 7 comments

Floating point addition and subtraction is causing small numerical errors in the learning process, sometimes causing the algorithm to create splits on 'phantom' gradients. We can solve this by consistently working with gradients in a high precision quantised format, only converting back to floating point for the final gain or weight calculations.

This PR is the first of 2 PRs, converting histogram related calculations to use quantised gradients.

A second PR will address the split evaluation part.

RAMitchell avatar Sep 14 '22 15:09 RAMitchell

Benchmarks look good:

dataset master integer
airline 58.75668815 59.16649409
bosch 12.61275904 10.81264534
covtype 14.17920381 14.0353592
epsilon 37.92378811 34.50982775
fraud 1.07923842 1.037245154
higgs 10.65482221 10.36926403
newsgroups 179.9059501 182.5651175
year 4.084809801 3.969981108

RAMitchell avatar Sep 19 '22 13:09 RAMitchell

Thank you for working on resolving this issue. What's the current status of the floating point error?

trivialfis avatar Sep 20 '22 13:09 trivialfis

This does not fix anything yet, I need to do this second part which affects the calculations in split evaluation.

This spark test failure looks unrelated to this PR.

RAMitchell avatar Sep 20 '22 14:09 RAMitchell

I restarted the CI.

trivialfis avatar Sep 21 '22 10:09 trivialfis

The pyspark error seems to be related.

trivialfis avatar Sep 21 '22 13:09 trivialfis

I think I need to reduce over the distributed workers in the histogram rounding bound calculation.

RAMitchell avatar Sep 21 '22 14:09 RAMitchell

I think I need to reduce over the distributed workers in the histogram rounding bound calculation.

hmm .. why?

trivialfis avatar Sep 21 '22 14:09 trivialfis

Allreduce is now happening on quantised gradients. We would get the wrong results if each worker provided integer gradients quantised in a different scale.

RAMitchell avatar Sep 23 '22 08:09 RAMitchell

@trivialfis can I please get another review?

RAMitchell avatar Sep 23 '22 15:09 RAMitchell