MLJBase.jl icon indicating copy to clipboard operation
MLJBase.jl copied to clipboard

Weights in pipelines

Open olivierlabayle opened this issue 5 months ago • 2 comments

Hello,

I am trying to use weights in MLJ and surprised by the behaviour in a pipeline which I think is a bug. In the following, the supervised component does not support weights but neither an error is thrown nor a warning displayed.

using MLJBase
using MLJModels
using MLJLinearModels

n = 100
X = (A=categorical(rand([0, 1], n)), B=categorical(rand([0, 1], n)))
y = categorical(rand([0, 1], n))
weights = [y_ == true ? 0.1 : 0.8 for y_ in y]
pipe = Pipeline(MLJModels.ContinuousEncoder(), LogisticClassifier())

# Train without weights: all good
unweighted_mach = machine(pipe, X, y)
fit!(unweighted_mach, verbosity=2)
preds_unweighted = predict(unweighted_mach)

# Train with weights: no warning, no error and weights are ignored
weighted_mach = machine(pipe, X, y, weights)
fit!(weighted_mach, verbosity=2)
preds_weighted = predict(weighted_mach)

[x.prob_given_ref[1] for x in preds_unweighted] == [x.prob_given_ref[1] for x in preds_weighted] # returns true

More generally, outside a pipeline, passing weights to models that don't support it only prints a warning and fit proceeds. I thought it would throw an error:

n = 100
X = (A=rand(n), B=rand(n))
y = categorical(rand([0, 1], n))
weights = [y_ == true ? 0.1 : 0.8 for y_ in y]

weighted_mach = machine(LogisticClassifier(), X, y, weights) # warning here
fit!(weighted_mach)
predict(weighted_mach)

olivierlabayle avatar Jul 31 '25 10:07 olivierlabayle

@olivierlabayle Thank you for reporting.

I agree that this behavior is unexpected.

Do you think it suffices to "forward" the supports_weights trait from any supervised component in the pipeline to the pipeline? What if there is an unsupervised component that supports weights, but a supervised component that does not?

ablaom avatar Sep 05 '25 01:09 ablaom

We have currently had that approach, that is, checking the supervised component of a pipeline. But I suppose you are correct, maybe a pipeline should return true if any of the components does support weights. Then, to know whether the final model supports weights, the user should use the MLJBase.supervised_component first. At least this makes sense to me, probably just needs to be documented.

olivierlabayle avatar Sep 18 '25 17:09 olivierlabayle