JLBoost.jl icon indicating copy to clipboard operation
JLBoost.jl copied to clipboard

Weights per datapoint

Open baggepinnen opened this issue 4 years ago • 2 comments

Hey and thanks for this package!

Do you have any idea on how to incorporate different weights for each datapoints, in the sense that datapoints with low weights should be less important for the model to fit than those with high weights?

For instance, by specifying the loss as WeightedLogitLogLoss(weights) etc.'

LossFunction.jl has some support for this, e.g.

value(LogitLogLoss(), Y, Y, AggMode.WeightedSum(ones(length(Y))))

baggepinnen avatar Feb 06 '20 08:02 baggepinnen

Thanks for the question: I had a think about this and went through the code. Due to how the XGBoost algorithm works, weights are not inputable using the approach you mentioned. Rather, it needs a weight parameter to the jlboost function.

See discussion for increasing the weight for the gradient and hessian. And based on my understanding, this would be the approach too. https://github.com/dmlc/xgboost/issues/144#issuecomment-70431635

I will support weights soon once I am done with my other OS responsibilities.

For now, unfortunately, it's not possible unless you hack the code. In particular by adding weights to the g and h functions and by passing weights to where g and h are called

xiaodaigh avatar Feb 06 '20 11:02 xiaodaigh

I took the approach you suggested in the DecisionTree package https://github.com/bensadeghi/DecisionTree.jl/pull/113 it seems like it is working alright

baggepinnen avatar Feb 06 '20 12:02 baggepinnen