xgboost
xgboost copied to clipboard
Partition optimization
it's part of #7192
Current implementation of partition allow to read bin matrix data with less random access
full train | airline, ~100m | higgs, ~10m |
---|---|---|
master, s | 403 | 53.7 |
this PR, s | 170s | 32.5 |
speedup | 2.37 | 1.65 |
Codecov Report
Merging #7208 (137183e) into master (3290a4f) will not change coverage. The diff coverage is
n/a
.
@@ Coverage Diff @@
## master #7208 +/- ##
=======================================
Coverage 82.67% 82.67%
=======================================
Files 13 13
Lines 4017 4017
=======================================
Hits 3321 3321
Misses 696 696
Impacted Files | Coverage Δ | |
---|---|---|
python-package/xgboost/core.py | 84.35% <0.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 3290a4f...137183e. Read the comment docs.
@trivialfis Can we start detailed review of this PR as this part is related to partition optimizations only? Optimizations of histogram, evaluate and sync kernels are prepared in separate branches
@trivialfis failed steps Jenkins Linux: Test
and Jenkins Linux: Deploy
have similar logs as latest commit in master branch^ https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/release_1.5.0/1/pipeline
seems there is no any failed checks/tests affected by current changes.
Yup, I'm aware of the failure. It's blocking the release.
@trivialfis - are we considering this PR to be integrated prior to release?
No, I don't think it's a good idea to integrate this amount of changes. I looked briefly before and I think it can use some cleanups. I'm on holiday right now. Will provide detailed review and possible changes once I come back.
@trivialfis - ok, thanks. Have good holidays!
similar fails in master branch here: docker: Error response from daemon: OCI runtime create failed @trivialfis should I try to restart tests again?
Restarted.
@hcho3 The issue seems to be lingering.
t will affect the precision, otherwise we need to support new
Got it.
For some reason, I can't reply inline. I will take a deeper dive into the optimization and see if we can work on smaller subsets first. Thanks for the hard work!
Hi, I took another look, but only at the histogram building kernel this time. Can we start by working on this function first? Merging smaller functions makes review and test easier, also has the benefit of decoupling different code sections. What do you think?
@trivialfis Thanks a lot for provided review! I'd love to follow your proposal, but main changes in histogram builder (handling of node ids, mapping, memory allocation etc.) are introduced to support changes in partition builder and not applicable without it. Seems only the movement of build hist kernels to header file to avoid template specialization can be decoupled. But I'll try to simplify it anyway.
I'm happy to schedule a call if needed. I think we have some different points of view and would be great to share more.
I'm happy to schedule a call if needed. I think we have some different points of view and would be great to share more.
That's great! We could discuss the delivery plan for these optimizations. I'm free for a call at any time of 6'th and 7'th November ([email protected] - for direct connection). Or we can arrange a discussion some time later, after applying comments at least related to:
- code simplicity, number of class members, input arguments etc.
- possibly strategy pattern implementation in partition builder
I'm free for a call at any time of 6'th and 7'th November ([email protected] - for direct connection).
Thanks, sent an email.
Or we can arrange a discussion some time later, after applying comments at least related to:
Happy to make the schedule whenever you see fit. Let's continue the discussion offline. ;-)
@trivialfis is there a way to handle this problem: "Public key for epel-release-7-14.noarch.rpm is not installed"
@trivialfis, similar fails were observed here
Opened a PR: https://github.com/dmlc/xgboost/pull/7884 to see if we can fix it.
Thank you for continuing the work on optimization. But looking at the code I think my comment will be similar to previous ones. Can we work on smaller chunks of code first and merge the rest incrementally?
Thank you for continuing the work on optimization. But looking at the code I think my comment will be similar to previous ones. Can we work on smaller chunks of code first and merge the rest incrementally?
All changes in this PR are related to each other: new partition builder required to introduce changes into histogram building kernels, and each part doesn't work separetly.
As common histogram building kernels are used in 'approx' and 'hist' methods, I had to introduce changes in 'approx' builder also. Now new optimized partition builder is applicable for both methods and even common RowPartitioner
was introduced to avoid existed code duplication :)
@trivialfis Do you think it's better to split changes in 'approx' and 'hist' methods into separate PR's?
Now I see only three not addressed comments related only to 'opt_partition_builder.h' about number of fileds and parameters, sorry for not addressing it yet. I'll try to simplify the code.
Hello, everyone. I have started decomposition of this PR into the smaller parts. The first one is presented here: https://github.com/dmlc/xgboost/pull/7991