DecisionTree.jl icon indicating copy to clipboard operation
DecisionTree.jl copied to clipboard

Problem with adaboost

Open andreasnoack opened this issue 7 years ago • 2 comments

For some reason, boosting doesn't seem to work. I don't think the issue here is the same as #42. I tried the example from Elements of Statistical Learning and compared to fastAdaboost in R

julia> using Distributions, DecisionTree, RCall, DataFrames

julia> # Boosting example from EoSL
       X = randn(1000, 10);

julia> y = Vector{Int64}(vec(sum(abs2, X, 2) .> quantile(Chisq(10), 0.5)));

julia> # Use DecisionTree
       ada1 = DecisionTree.build_adaboost_stumps(y, X, 5);

julia> mean(apply_adaboost_stumps(ada1..., X) .== y)
0.579

julia> # Use fastAdaboost
       R"library(fastAdaboost)";

julia> df = DataFrame(X);

julia> df[:y] = y;

julia> ada2 = R"adaboost(y ~ x1 + x2 + x3 + x4 + x5 + x6 +x7 + x8 + x9 + x10, data = $df, 5)";

julia> rcopy(R"predict($ada2, newdata = $df)$error")
0.021

Furthermore, the build_adaboost_stumpss is much slower than adaboost from fastAdaboost. It looks like build_adaboost_stumps might not use the same optimizations as build_tree.

andreasnoack avatar Dec 13 '17 22:12 andreasnoack

Yeah, build_adaboost_stumps has always had issues, and it does use a different optimization technique than build_tree, which is quite slow. Not sure what to do here; it requires significant work. I've been wondering if we should remove it from the package all together.

Any thoughts, ideas, advice?

bensadeghi avatar Dec 22 '17 05:12 bensadeghi

I might try to take a look at it and maybe I can figure out what is going on. If not it might be better to disable.

andreasnoack avatar Dec 22 '17 09:12 andreasnoack