xgboost icon indicating copy to clipboard operation
xgboost copied to clipboard

Partition optimization

Open ShvetsKS opened this issue 3 years ago • 20 comments

it's part of #7192

Current implementation of partition allow to read bin matrix data with less random access

full train airline, ~100m higgs, ~10m
master, s 403 53.7
this PR, s 170s 32.5
speedup 2.37 1.65

ShvetsKS avatar Sep 02 '21 20:09 ShvetsKS

Codecov Report

Merging #7208 (137183e) into master (3290a4f) will not change coverage. The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #7208   +/-   ##
=======================================
  Coverage   82.67%   82.67%           
=======================================
  Files          13       13           
  Lines        4017     4017           
=======================================
  Hits         3321     3321           
  Misses        696      696           
Impacted Files Coverage Δ
python-package/xgboost/core.py 84.35% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 3290a4f...137183e. Read the comment docs.

codecov-commenter avatar Sep 03 '21 06:09 codecov-commenter

@trivialfis Can we start detailed review of this PR as this part is related to partition optimizations only? Optimizations of histogram, evaluate and sync kernels are prepared in separate branches

ShvetsKS avatar Sep 22 '21 14:09 ShvetsKS

@trivialfis failed steps Jenkins Linux: Test and Jenkins Linux: Deploy have similar logs as latest commit in master branch^ https://xgboost-ci.net/blue/organizations/jenkins/xgboost/detail/release_1.5.0/1/pipeline seems there is no any failed checks/tests affected by current changes.

ShvetsKS avatar Sep 26 '21 15:09 ShvetsKS

Yup, I'm aware of the failure. It's blocking the release.

trivialfis avatar Sep 26 '21 16:09 trivialfis

@trivialfis - are we considering this PR to be integrated prior to release?

napetrov avatar Oct 01 '21 12:10 napetrov

No, I don't think it's a good idea to integrate this amount of changes. I looked briefly before and I think it can use some cleanups. I'm on holiday right now. Will provide detailed review and possible changes once I come back.

trivialfis avatar Oct 01 '21 13:10 trivialfis

@trivialfis - ok, thanks. Have good holidays!

napetrov avatar Oct 01 '21 13:10 napetrov

similar fails in master branch here: docker: Error response from daemon: OCI runtime create failed @trivialfis should I try to restart tests again?

ShvetsKS avatar Oct 03 '21 10:10 ShvetsKS

Restarted.

@hcho3 The issue seems to be lingering.

trivialfis avatar Oct 03 '21 10:10 trivialfis

t will affect the precision, otherwise we need to support new

Got it.

For some reason, I can't reply inline. I will take a deeper dive into the optimization and see if we can work on smaller subsets first. Thanks for the hard work!

trivialfis avatar Oct 29 '21 17:10 trivialfis

Hi, I took another look, but only at the histogram building kernel this time. Can we start by working on this function first? Merging smaller functions makes review and test easier, also has the benefit of decoupling different code sections. What do you think?

@trivialfis Thanks a lot for provided review! I'd love to follow your proposal, but main changes in histogram builder (handling of node ids, mapping, memory allocation etc.) are introduced to support changes in partition builder and not applicable without it. Seems only the movement of build hist kernels to header file to avoid template specialization can be decoupled. But I'll try to simplify it anyway.

ShvetsKS avatar Oct 31 '21 15:10 ShvetsKS

I'm happy to schedule a call if needed. I think we have some different points of view and would be great to share more.

trivialfis avatar Nov 03 '21 20:11 trivialfis

I'm happy to schedule a call if needed. I think we have some different points of view and would be great to share more.

That's great! We could discuss the delivery plan for these optimizations. I'm free for a call at any time of 6'th and 7'th November ([email protected] - for direct connection). Or we can arrange a discussion some time later, after applying comments at least related to:

  1. code simplicity, number of class members, input arguments etc.
  2. possibly strategy pattern implementation in partition builder

ShvetsKS avatar Nov 05 '21 12:11 ShvetsKS

I'm free for a call at any time of 6'th and 7'th November ([email protected] - for direct connection).

Thanks, sent an email.

Or we can arrange a discussion some time later, after applying comments at least related to:

Happy to make the schedule whenever you see fit. Let's continue the discussion offline. ;-)

trivialfis avatar Nov 05 '21 19:11 trivialfis

@trivialfis is there a way to handle this problem: "Public key for epel-release-7-14.noarch.rpm is not installed"

ShvetsKS avatar May 09 '22 07:05 ShvetsKS

@trivialfis, similar fails were observed here

ShvetsKS avatar May 10 '22 06:05 ShvetsKS

Opened a PR: https://github.com/dmlc/xgboost/pull/7884 to see if we can fix it.

trivialfis avatar May 11 '22 10:05 trivialfis

Thank you for continuing the work on optimization. But looking at the code I think my comment will be similar to previous ones. Can we work on smaller chunks of code first and merge the rest incrementally?

trivialfis avatar May 13 '22 18:05 trivialfis

Thank you for continuing the work on optimization. But looking at the code I think my comment will be similar to previous ones. Can we work on smaller chunks of code first and merge the rest incrementally?

All changes in this PR are related to each other: new partition builder required to introduce changes into histogram building kernels, and each part doesn't work separetly. As common histogram building kernels are used in 'approx' and 'hist' methods, I had to introduce changes in 'approx' builder also. Now new optimized partition builder is applicable for both methods and even common RowPartitioner was introduced to avoid existed code duplication :) @trivialfis Do you think it's better to split changes in 'approx' and 'hist' methods into separate PR's? Now I see only three not addressed comments related only to 'opt_partition_builder.h' about number of fileds and parameters, sorry for not addressing it yet. I'll try to simplify the code.

ShvetsKS avatar May 15 '22 20:05 ShvetsKS

Hello, everyone. I have started decomposition of this PR into the smaller parts. The first one is presented here: https://github.com/dmlc/xgboost/pull/7991

razdoburdin avatar Jun 17 '22 07:06 razdoburdin