keras Unexpected Losses with Sample Weights

trafficstars

When using sample_weight, and the default reduction sum_over_batch_size, the computed losses are technically correct (they are the sum divided by the batch size), but they are not what someone would want them to be. They are computed by summing the loss * sample_weight and dividing by the number of [mask applied] items in the tensor. That is, they are not computed by dividing by the sum of the sample weights.

For example,

keras.losses.MeanAbsoluteError()(
    y_true=np.array([[1.0], [2.0]]),
    y_pred=np.array([[2.0], [3.0]]),
    sample_weight=np.array([[0.0], [1.0]])
    ).numpy()

returns 0.5 not 1.0. (The denominator of the calculation is 2.0 because there are two samples where the loss is applied, despite one of them having a sample weight of 0.0.)

Notably, the metric version

keras.metrics.MeanAbsoluteError()(
    y_true=np.array([[1.0], [2.0]]),
    y_pred=np.array([[2.0], [3.0]]),
    sample_weight=np.array([[0.0], [1.0]])
    ).numpy()

returns 1.0 as one would expect because it divides by the sum of the sample weights.

The metrics version uses keras.src.utils.metrics_utils.Reduction() of weighted_mean by default (not sum_over_batch_size). However, the losses keras.losses.Reduction() has no such equivalent. This means the loss computes a different value from the associated metric during training.

This is a long-standing issue, but I verified this in both Keras 2.15.0 (TensorFlow 2.15.0) and 3.3.3 (TensorFlow 2.16.1). https://colab.research.google.com/drive/1TRBeOE79kfxPwz1-C60N3IjXeLSUbgST?usp=sharing

Should someone either change the default behavior to be the weighted mean (divide by sum of the sample weights) or add another loss reduction option that enables this? I think this is a significant issue that affects neural network training.

Note that when a mask is applied, the function keras.utils.losses_utils.apply_valid_mask is used to exclude some items from the loss calculation by setting their sample weights to 0.0 and adjusting the denominator to only count the number of items in the tensor that pass through the mask. Therefore, in the special case of all sample weights being 1.0 but some getting masked out, the denominator is adjusted to get the effect of dividing by the sum of the sample weights rather than the "batch size". Thus, in this one special (but likely common) case, the output is what one would expect. It just doesn't work out that way when some of the included sample weights are different from 1.0.

May 21 '24 18:05 seandaug

Hi @seandaug ,

I have replicated the reported behaviour and attached gist. Will dig more and comeback. Thanks!

May 22 '24 14:05 SuryanarayanaY

I have come across a similar problem. Using keras 3. I tested it with jax and torch backend. The sample_weight is not giving weight to the error on a given position, but already to the final reduced error. Example with MeanSquaredError:

In [11]: MeanSquaredError(reduction="sum")(np.array([1,2,3]), np.array([1,1,1]),  sample_weight=np.array([1,1,1]))
Out[11]: Array(5., dtype=float32)

In [12]: MeanSquaredError(reduction="sum")(np.array([1,2,3]), np.array([1,1,1]),  sample_weight=np.array([1,0,0]))
Out[12]: Array(1.6666667, dtype=float32)

In [13]: MeanSquaredError(reduction="sum")(np.array([1,2,3]), np.array([1,1,1]),  sample_weight=np.array([0,1,0]))
Out[13]: Array(1.6666667, dtype=float32)

In [14]: MeanSquaredError(reduction="sum")(np.array([1,2,3]), np.array([1,1,1]),  sample_weight=np.array([0,0,1]))
Out[14]: Array(1.6666667, dtype=float32)

In [17]: MeanSquaredError(reduction="sum")(np.array([1,2,3]), np.array([1,1,1]),  sample_weight=np.array([1,0,1]))
Out[17]: Array(3.3333335, dtype=float32)

In [15]: MeanSquaredError(reduction=None)(np.array([1,2,3]), np.array([1,1,1]),  sample_weight=np.array([0,0,1]))
Out[15]: Array([0.       , 0.       , 1.6666667], dtype=float32)

In [16]: MeanSquaredError(reduction=None)(np.array([1,2,3]), np.array([1,1,1]),  sample_weight=np.array([1,1,1]))
Out[16]: Array([1.6666667, 1.6666667, 1.6666667], dtype=float32)

In this case, from what I gather, the expected behavior should be to have the answer be the squared errors: [0, 1, 4] and the sample weights would multiply by this result, and then after apply the reduction (mean).

Jun 28 '24 16:06 JotaFan

Are you satisfied with the resolution of your issue? Yes No

Oct 26 '24 17:10 google-ml-butler[bot]

Thanks. It seems like the following now makes the same calculation as the metric case using tf-nightly:

keras.losses.MeanAbsoluteError(reduction='mean_with_sample_weight')(
    y_true=np.array([[1.0], [2.0]]),
    y_pred=np.array([[2.0], [3.0]]),
    sample_weight=np.array([[0.0], [1.0]])
    ).numpy()

However, I am concerned that the default reduction for Loss still doesn't match the behavior for Metric, which will lead to confusion and unexpected behavior. I think the default reduction should now be mean_with_sample_weight instead of sum_over_batch_size, even though this breaks backward compatibility (it fixes a longstanding bug). When someone specifies sample weights, they expect them to be used by default. If the the default reduction is changed, someone could still manually re-enable the old behavior by setting the reduction to sum_over_batch_size if they wish.

Nov 07 '24 16:11 seandaug

keras keras copied to clipboard

Unexpected Losses with Sample Weights

keras
keras copied to clipboard