DecisionTree.jl icon indicating copy to clipboard operation
DecisionTree.jl copied to clipboard

Multithreaded support for apply_forest_proba (Issue #209)

Open salbert83 opened this issue 2 years ago • 8 comments

I think regression uses the functions in ../classification/main.jl for applying forests to a set of features, so no new development required for this.

Fixes #209

salbert83 avatar Jan 15 '23 14:01 salbert83

Codecov Report

Merging #210 (686a44b) into dev (835f3cd) will increase coverage by 0.03%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev     #210      +/-   ##
==========================================
+ Coverage   87.99%   88.02%   +0.03%     
==========================================
  Files          10       10              
  Lines        1249     1253       +4     
==========================================
+ Hits         1099     1103       +4     
  Misses        150      150              
Impacted Files Coverage Δ
src/classification/main.jl 96.16% <100.00%> (+0.05%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Jan 15 '23 14:01 codecov-commenter

@salbert83 Would you have some time soon to respond to the review?

ablaom avatar Jan 25 '23 23:01 ablaom

@rikhuijzer I'm not getting a response here. Are you willing and able to fishish this?

ablaom avatar Feb 06 '23 21:02 ablaom

It looks like this would result in a nested @threads call. One time in stack_function_results and one time inside the row_fun that is passed into stack_function_results. Nested @threads should be possible with the :dynamic scheduler, which appears to have only been added in Julia 1.8 (https://github.com/JuliaLang/julia/blob/master/HISTORY.md). Also, I don't know how to benchmark whether adding multithreading actually saves time or not.

So, let's leave this open until the lower bound is set to Julia 1.8 or until someone who really needs this implements and shows benchmarks?

rikhuijzer avatar Feb 07 '23 07:02 rikhuijzer

Thanks @rikhuijzer for looking into this.

It looks like this would result in a nested @threads call.

So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?

I also notice that the existing implementation is buy-in (use_multithreading=false is the default) whereas the present addition is buy-out.

ablaom avatar Feb 07 '23 22:02 ablaom

So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?

Maybe that explains https://github.com/JuliaAI/DecisionTree.jl/pull/188#issuecomment-1202086361. I'm afraid, I don't know and also I never need multithreading so I'm not the right person to ask unfortunately.

Maybe figuring out the right multithreading for this package is something you would like, @ExpandingMan?

rikhuijzer avatar Feb 08 '23 09:02 rikhuijzer

@OkonSamuel Do you see obvious issues with the way multithreading is currently implemented in prediction? It's here: https://github.com/JuliaAI/DecisionTree.jl/blob/f57a15633f5aadadfc408a8d5e42836e1f011c3f/src/classification/main.jl#L468

ablaom avatar Feb 08 '23 20:02 ablaom

@rikhuijzer I don't believe nested multithreading is an issue. This has been tested before in MLJTuning where optimization multithreading has within it resampling multithreading. My interpretation of the 1.9 changes cited is only that nested threading will typically be more efficient with new default settings for the scheduler.

I don't see anything obvious wrong about the proposed implementation (or the existing one), provided user must buy-in, but will wait for the pinged experts to hopefully weigh in.

ablaom avatar Feb 10 '23 01:02 ablaom