DecisionTree.jl
DecisionTree.jl copied to clipboard
Multithreaded support for apply_forest_proba (Issue #209)
I think regression uses the functions in ../classification/main.jl for applying forests to a set of features, so no new development required for this.
Fixes #209
Codecov Report
Merging #210 (686a44b) into dev (835f3cd) will increase coverage by
0.03%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## dev #210 +/- ##
==========================================
+ Coverage 87.99% 88.02% +0.03%
==========================================
Files 10 10
Lines 1249 1253 +4
==========================================
+ Hits 1099 1103 +4
Misses 150 150
Impacted Files | Coverage Δ | |
---|---|---|
src/classification/main.jl | 96.16% <100.00%> (+0.05%) |
:arrow_up: |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
@salbert83 Would you have some time soon to respond to the review?
@rikhuijzer I'm not getting a response here. Are you willing and able to fishish this?
It looks like this would result in a nested @threads
call. One time in stack_function_results
and one time inside the row_fun
that is passed into stack_function_results
. Nested @threads
should be possible with the :dynamic
scheduler, which appears to have only been added in Julia 1.8 (https://github.com/JuliaLang/julia/blob/master/HISTORY.md). Also, I don't know how to benchmark whether adding multithreading actually saves time or not.
So, let's leave this open until the lower bound is set to Julia 1.8 or until someone who really needs this implements and shows benchmarks?
Thanks @rikhuijzer for looking into this.
It looks like this would result in a nested @threads call.
So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?
I also notice that the existing implementation is buy-in (use_multithreading=false
is the default) whereas the present addition is buy-out.
So, does this also apply to the existing implementation added in https://github.com/JuliaAI/DecisionTree.jl/pull/188/files that therefore needs attention?
Maybe that explains https://github.com/JuliaAI/DecisionTree.jl/pull/188#issuecomment-1202086361. I'm afraid, I don't know and also I never need multithreading so I'm not the right person to ask unfortunately.
Maybe figuring out the right multithreading for this package is something you would like, @ExpandingMan?
@OkonSamuel Do you see obvious issues with the way multithreading is currently implemented in prediction? It's here: https://github.com/JuliaAI/DecisionTree.jl/blob/f57a15633f5aadadfc408a8d5e42836e1f011c3f/src/classification/main.jl#L468
@rikhuijzer I don't believe nested multithreading is an issue. This has been tested before in MLJTuning where optimization multithreading has within it resampling multithreading. My interpretation of the 1.9 changes cited is only that nested threading will typically be more efficient with new default settings for the scheduler.
I don't see anything obvious wrong about the proposed implementation (or the existing one), provided user must buy-in, but will wait for the pinged experts to hopefully weigh in.