ultrasound-nerve-segmentation icon indicating copy to clipboard operation
ultrasound-nerve-segmentation copied to clipboard

loss function

Open saeedalahmari3 opened this issue 6 years ago • 7 comments

I didn't understand this loss functoin return -dice_coef(y_true, y_pred). For backpropagation I think we need a differentiable loss function for instance return 0.5*math.pow(1-dice_coef(y_true, y_pred),2) Is it true?

saeedalahmari3 avatar Mar 27 '18 12:03 saeedalahmari3

dice_coef is a function that we want to minimize. You can minimize it in number of ways:

  • -dice_coef
  • 1/dice_coef
  • your proposition

jocicmarko avatar Mar 28 '18 08:03 jocicmarko

Hi @saeedalahmari3, This project defines Dice coefficient as:

def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true)
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)

Compared to its traditional definition it is already smooth and differentiable. And negative (-dice_coef) of a differentiable function is also a differentiable function, even without pow.

Since y_pred and y_true are bounded between 0 and 1 (if y_pred is the result of a softmax), Dice coefficient should also always be positive and bounded.

The pow(..., 2) or K.square is used, for example, in the mean_squared_error loss:


def mean_squared_error(y_true, y_pred):
    return K.mean(K.square(y_pred - y_true), axis=-1)

pow is differentiable, but that is not its purpose there. Most importantly pow makes the (y_pred - y_true) difference always positive (which Dice coefficient is by definition). The 0.5 there is only a cosmetic detail for prettier math on paper. Please also note that instead of math.pow(..., 2) it probably should be K.pow(...., 2).

Of course, you could even try to use the mean_squared_error loss for segmentation, but this could largely ignore the less dominant class (that is why there is Dice coefficient in the first place instead of more frequently used binary_crossentropy).

For more loss functions see here.

But maybe I misunderstand. Why would you need any pow(..., 2) here?

As @jocicmarko says it could work, your function stays differentiable. Of course, give it a try and use the best outcome, after all, that is what we all care about :)

jmargeta avatar Mar 28 '18 08:03 jmargeta

@jmargeta, I'm sorry if this is a trivial question, but how does the addition of the smooth term make this function differentiable?

nabsabraham avatar Aug 22 '18 20:08 nabsabraham

@brownpanther, adding a small constant to the denominator prevents the possibility of division by zero when K.sum(y_true_f) + K.sum(y_pred_f) equals to 0 . It would otherwise be a point with no defined derivative.

Even if the sum is a very small positive number, adding eps dampens the dramatic changes in gradient that could be even caused by single pixel changes in the prediction/groundtruth.

jmargeta avatar Aug 23 '18 13:08 jmargeta

@jmargeta thanks for the explanation! but, even before the addition of the smooth term, the dice function is differentiable because the last layer's output is a sigmoid/softmax ie probabilities rather than 0 or 1, correct? The smooth term just helps with the gradient flow - is this understanding correct?

nabsabraham avatar Aug 24 '18 12:08 nabsabraham

@brownpanther Yes, even without the smooth term the function itself would be differentiable almost everywhere (except for K.sum(y_true_f) + K.sum(y_pred_f) == 0). The term, however, makes the function differentiable everywhere, with no exceptions.

Using sigmoid/softmax in the last layer does not influence the differentiability of the dice function itself. Passing a continuous input into a differentiable function results in a continuous change of its output and can indeed help with the gradient flow.

In the end, perfect differentiability is often not such a big deal. Even one of the most commonly used activation functions of today - ReLU (def relu(x): return max(0, x)) is not differentiable at x=0 and we can compute its sub-derivatives (1 if x > 0 else 0) and train the nets rather successfully.

jmargeta avatar Sep 03 '18 14:09 jmargeta

hello, I use dice loss function in U-net, but the predicted images were all white or grey. But when I use the default binary_cross_entropy loss function, the predicted images are good. Is there something wrong?

jizhang02 avatar Mar 06 '19 15:03 jizhang02