ultrasound-nerve-segmentation
ultrasound-nerve-segmentation copied to clipboard
loss function
I didn't understand this loss functoin return -dice_coef(y_true, y_pred)
. For backpropagation I think we need a differentiable loss function for instance return 0.5*math.pow(1-dice_coef(y_true, y_pred),2)
Is it true?
dice_coef
is a function that we want to minimize. You can minimize it in number of ways:
-
-dice_coef
-
1/dice_coef
- your proposition
Hi @saeedalahmari3, This project defines Dice coefficient as:
def dice_coef(y_true, y_pred):
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
return (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
Compared to its traditional definition it is already smooth and differentiable.
And negative (-dice_coef
) of a differentiable function is also a differentiable function, even without pow.
Since y_pred
and y_true
are bounded between 0 and 1 (if y_pred
is the result of a softmax), Dice coefficient should also always be positive and bounded.
The pow(..., 2)
or K.square
is used, for example, in the mean_squared_error
loss:
def mean_squared_error(y_true, y_pred):
return K.mean(K.square(y_pred - y_true), axis=-1)
pow
is differentiable, but that is not its purpose there.
Most importantly pow
makes the (y_pred - y_true)
difference always positive (which Dice coefficient is by definition). The 0.5
there is only a cosmetic detail for prettier math on paper.
Please also note that instead of math.pow(..., 2)
it probably should be K.pow(...., 2)
.
Of course, you could even try to use the mean_squared_error
loss for segmentation, but this could largely ignore the less dominant class (that is why there is Dice coefficient in the first place instead of more frequently used binary_crossentropy
).
For more loss functions see here.
But maybe I misunderstand. Why would you need any pow(..., 2)
here?
As @jocicmarko says it could work, your function stays differentiable. Of course, give it a try and use the best outcome, after all, that is what we all care about :)
@jmargeta, I'm sorry if this is a trivial question, but how does the addition of the smooth term make this function differentiable?
@brownpanther, adding a small constant to the denominator prevents the possibility of division by zero when K.sum(y_true_f) + K.sum(y_pred_f) equals to 0 . It would otherwise be a point with no defined derivative.
Even if the sum is a very small positive number, adding eps dampens the dramatic changes in gradient that could be even caused by single pixel changes in the prediction/groundtruth.
@jmargeta thanks for the explanation! but, even before the addition of the smooth term, the dice function is differentiable because the last layer's output is a sigmoid/softmax ie probabilities rather than 0 or 1, correct? The smooth term just helps with the gradient flow - is this understanding correct?
@brownpanther Yes, even without the smooth term the function itself would be differentiable almost everywhere (except for K.sum(y_true_f) + K.sum(y_pred_f) == 0
). The term, however, makes the function differentiable everywhere, with no exceptions.
Using sigmoid/softmax in the last layer does not influence the differentiability of the dice function itself. Passing a continuous input into a differentiable function results in a continuous change of its output and can indeed help with the gradient flow.
In the end, perfect differentiability is often not such a big deal.
Even one of the most commonly used activation functions of today - ReLU (def relu(x): return max(0, x)
) is not differentiable at x=0 and we can compute its sub-derivatives (1 if x > 0 else 0
) and train the nets rather successfully.
hello, I use dice loss function in U-net, but the predicted images were all white or grey. But when I use the default binary_cross_entropy loss function, the predicted images are good. Is there something wrong?