spark-libFM icon indicating copy to clipboard operation
spark-libFM copied to clipboard

Use 1/0 labels for binary classification instead of 1/-1

Open benmccann opened this issue 8 years ago • 3 comments

The loss function used in this library for binary classification is a hinge-loss function assuming labels +1 or -1:

case 1 =>
  1 - Math.signum(pred * label)

However, the predictions being made are in the range 0-1:

case 1 =>
  1.0 / (1.0 + Math.exp(-pred))

The 1 / 0 used in predictions should be preferred to the 1 / -1 expected in the loss function because the negative label is represented by 0 in spark.mllib instead of −1, to be consistent with multiclass labeling.

The loss function should be changed to be more like the way Spark does it.

benmccann avatar May 14 '16 01:05 benmccann

Ahh, looks like it does a transform. But I think this is a very non-standard way of doing things since the goal is to upstream this and have it merged to Spark's mllib. I believe they use the 1 / 0 representation internally and we shouldn't change that.

val data = task match {
  case 0 =>
    input.map(l => (l.label, l.features)).persist()
  case 1 =>
    input.map(l => (if (l.label > 0) 1.0 else -1.0, l.features)).persist()
}

benmccann avatar May 16 '16 18:05 benmccann

conclusion?

zdx avatar Feb 10 '17 06:02 zdx

In classification problem,why compute gradient use logitloss, but get loss use hingeloss ? get gradient in code as follows:

val mult = task match {
      case 0 =>
        pred - label
      case 1 =>
        -label * (1.0 - 1.0 / (1.0 + Math.exp(-label * pred)))
    }

get loss in code as follows:

task match {
      case 0 =>
        (pred - label) * (pred - label)
      case 1 =>
        1 - Math.signum(pred * label)            //hinge loss
    }

willysys avatar Oct 11 '18 03:10 willysys