spark-libFM
spark-libFM copied to clipboard
Use 1/0 labels for binary classification instead of 1/-1
The loss function used in this library for binary classification is a hinge-loss function assuming labels +1 or -1:
case 1 =>
1 - Math.signum(pred * label)
However, the predictions being made are in the range 0-1:
case 1 =>
1.0 / (1.0 + Math.exp(-pred))
The 1 / 0 used in predictions should be preferred to the 1 / -1 expected in the loss function because the negative label is represented by 0 in spark.mllib instead of −1, to be consistent with multiclass labeling.
The loss function should be changed to be more like the way Spark does it.
Ahh, looks like it does a transform. But I think this is a very non-standard way of doing things since the goal is to upstream this and have it merged to Spark's mllib. I believe they use the 1 / 0 representation internally and we shouldn't change that.
val data = task match {
case 0 =>
input.map(l => (l.label, l.features)).persist()
case 1 =>
input.map(l => (if (l.label > 0) 1.0 else -1.0, l.features)).persist()
}
conclusion?
In classification problem,why compute gradient use logitloss, but get loss use hingeloss ? get gradient in code as follows:
val mult = task match {
case 0 =>
pred - label
case 1 =>
-label * (1.0 - 1.0 / (1.0 + Math.exp(-label * pred)))
}
get loss in code as follows:
task match {
case 0 =>
(pred - label) * (pred - label)
case 1 =>
1 - Math.signum(pred * label) //hinge loss
}