Weights in pipelines
Hello,
I am trying to use weights in MLJ and surprised by the behaviour in a pipeline which I think is a bug. In the following, the supervised component does not support weights but neither an error is thrown nor a warning displayed.
using MLJBase
using MLJModels
using MLJLinearModels
n = 100
X = (A=categorical(rand([0, 1], n)), B=categorical(rand([0, 1], n)))
y = categorical(rand([0, 1], n))
weights = [y_ == true ? 0.1 : 0.8 for y_ in y]
pipe = Pipeline(MLJModels.ContinuousEncoder(), LogisticClassifier())
# Train without weights: all good
unweighted_mach = machine(pipe, X, y)
fit!(unweighted_mach, verbosity=2)
preds_unweighted = predict(unweighted_mach)
# Train with weights: no warning, no error and weights are ignored
weighted_mach = machine(pipe, X, y, weights)
fit!(weighted_mach, verbosity=2)
preds_weighted = predict(weighted_mach)
[x.prob_given_ref[1] for x in preds_unweighted] == [x.prob_given_ref[1] for x in preds_weighted] # returns true
More generally, outside a pipeline, passing weights to models that don't support it only prints a warning and fit proceeds. I thought it would throw an error:
n = 100
X = (A=rand(n), B=rand(n))
y = categorical(rand([0, 1], n))
weights = [y_ == true ? 0.1 : 0.8 for y_ in y]
weighted_mach = machine(LogisticClassifier(), X, y, weights) # warning here
fit!(weighted_mach)
predict(weighted_mach)
@olivierlabayle Thank you for reporting.
I agree that this behavior is unexpected.
Do you think it suffices to "forward" the supports_weights trait from any supervised component in the pipeline to the pipeline? What if there is an unsupervised component that supports weights, but a supervised component that does not?
We have currently had that approach, that is, checking the supervised component of a pipeline. But I suppose you are correct, maybe a pipeline should return true if any of the components does support weights. Then, to know whether the final model supports weights, the user should use the MLJBase.supervised_component first. At least this makes sense to me, probably just needs to be documented.