DecisionTree.jl icon indicating copy to clipboard operation
DecisionTree.jl copied to clipboard

AdaBoostStumpClassifier MethodError: zero(::Type{Symbol})

Open mmikhasenko opened this issue 5 months ago • 1 comments

The function fit! fails with number of iterations > 5.

bdt = let
    _model = AdaBoostStumpClassifier(; n_iterations = 10) 
    fit!(_model, X_train, y_train)
end

fails with an error,

MethodError: no method matching zero(::Type{Symbol})

The function `zero` exists, but no method is defined for this combination of argument types.

Closest candidates are:
  zero(::Type{Union{}}, Any...)
   @ Base number.jl:310
  zero(::Type{Dates.DateTime})
   @ Dates ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Dates/src/types.jl:458
  zero(::Type{Pkg.Resolve.VersionWeight})
   @ Pkg ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/Pkg/src/Resolve/versionweights.jl:15
  ...

It depends on dataset to train, see MWE, it works on one set, fails on the other

Image

MWE

begin
    using Random
    using DataFrames
    using DecisionTree
    Random.seed!(1234)
end

function classify_signal_background(x, y)
    # Sinusoidal boundary
    # if sin(2.5π * (x - 0.55)) / 5 + 0.3 + 0.4x < y < 0.7 + 0.4x # note: this one has no problem
    if (x-0.25)^2 + (y-0.25)^2 < 0.05 || (x-0.65)^2 + (y-0.65)^2 < 0.05
        return :signal
    else
        return :background
    end
end

const features = [:f1, :f2];

df = let
    _df = DataFrame(rand(500, 2), features)
    transform!(_df, features => ByRow(classify_signal_background) => :y)
end

bdt = let
    _model = AdaBoostStumpClassifier(; n_iterations = 40)
	X_train = df[:,features] |> Matrix
    y_train = df[:, :y]
    fit!(_model,X_train, y_train)
end

mmikhasenko avatar Jul 29 '25 08:07 mmikhasenko

@mmikhasenko Thanks for reporting and providing a MWE, which I have been able to reproduce. I am not a regular maintainer, but agree the documentation suggests that labels can be arbitrarily encoded in classification, so this is indeed a bug. (If you use the MLJ interface, you will be required to encode the target y_train as a CategoricalVector, and internally the target will be integer-encoded, in which case I would not expect to see this error.)

I have not diagnosed the precise issue, but notice the following workaround appears to work: Recode the labels as integers:

y_train = map(y_train) do y
    y == :signal ? 1 : 0
end

Curiously, you can alternatively recode as strings and no zero(::String) error is thrown.

y_train = string.(y_train)

If I have some more time, I may take a deeper look at this. In the meantime, perhaps another maintainer can take a look.

@mmikhasenko If you do diagnose this yourself, I can promise a timely review of any PR.

ablaom avatar Aug 01 '25 04:08 ablaom