xgboost
xgboost copied to clipboard
Use quantised gradients in gpu_hist histograms
Floating point addition and subtraction is causing small numerical errors in the learning process, sometimes causing the algorithm to create splits on 'phantom' gradients. We can solve this by consistently working with gradients in a high precision quantised format, only converting back to floating point for the final gain or weight calculations.
This PR is the first of 2 PRs, converting histogram related calculations to use quantised gradients.
A second PR will address the split evaluation part.
Benchmarks look good:
dataset | master | integer |
---|---|---|
airline | 58.75668815 | 59.16649409 |
bosch | 12.61275904 | 10.81264534 |
covtype | 14.17920381 | 14.0353592 |
epsilon | 37.92378811 | 34.50982775 |
fraud | 1.07923842 | 1.037245154 |
higgs | 10.65482221 | 10.36926403 |
newsgroups | 179.9059501 | 182.5651175 |
year | 4.084809801 | 3.969981108 |
Thank you for working on resolving this issue. What's the current status of the floating point error?
This does not fix anything yet, I need to do this second part which affects the calculations in split evaluation.
This spark test failure looks unrelated to this PR.
I restarted the CI.
The pyspark error seems to be related.
I think I need to reduce over the distributed workers in the histogram rounding bound calculation.
I think I need to reduce over the distributed workers in the histogram rounding bound calculation.
hmm .. why?
Allreduce is now happening on quantised gradients. We would get the wrong results if each worker provided integer gradients quantised in a different scale.
@trivialfis can I please get another review?