keras custom sparse categorical loss

I want to write a custom sparse categorical loss function in numpy or pure tensorflow. It should handle integer target labels and logit or probabilities output. To this end, I have the following:


def softmax(x, axis=-1):
    y = np.exp(x - np.max(x, axis, keepdims=True))
    return y / np.sum(y, axis, keepdims=True)

def categorical_crossentropy(target, output, from_logits=False):
    if from_logits:
        output = softmax(output)
    else:
        output /= output.sum(axis=-1, keepdims=True)
    output = np.clip(output, 1e-7, 1 - 1e-7)
    return np.sum(target * -np.log(output), axis=-1, keepdims=False)

I can do (target is one hot)


y_true = np.array([[0, 1, 0], [0, 0, 1]])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])

categorical_crossentropy(y_true, y_pred)
array([0.05129329, 2.30258509])

But I can not do it (target is an integer, desired).


y_true = np.array([1, 2])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])

categorical_crossentropy(y_true, y_pred)
ValueError: operands could not be broadcast together with shapes (2,) (2,3)

How to achieve this, where a loss function takes integer target and is able to compute with logits as well as probabilities output. I know there is a built-in function (sparse_categorical_crossentropy), but I like to write it in plain numpy or pure tensorflow as a custom loss function.

Jun 30 '22 10:06 pure-rgb

In the sparse case, we don't have to multiply (target * -np.log(output), as target = 1 for label i and 0 for the remaining one. It's more efficient to simply pick the i-th output:

p = tf.gather(p, y, axis=1, batch_dims=1)
p = -tf.math.log(p)

Jun 30 '22 14:06 lucasdavid

can you elaborate with full working code? what is p and y in your above code?

Jul 03 '22 18:07 pure-rgb

p is the predictions or output. y is the labels or target. Starting from your own implementation:

def categorical_crossentropy(target, output, from_logits=False):
    ...
    return np.sum(target * -np.log(output), axis=-1, keepdims=False)

$\text{target}\in[0, 1]$, so all output values are important and might affect the loss function. For the sparse case, however, only one item in target is 1.0, while the remaining ones are 0. This means that, for $n$ classes with $i$ being the true label, sum will be:

$$0\times output_0 + 0\times output_1 + ... + 1\times output_i + ... 0\times output_n = 1\times output_i = output_i$$

So we don't add a bunch of 0s which would not affect the result. Instead, we just pick the i-th output for each sample in the batch:

def sparse_categorical_crossentropy(output, target):
  output_i = output[range(len(target)), target]
  return -np.log(output_i)

y_true = np.asarray([1, 2])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])

sparse_categorical_crossentropy(y_pred, y_true)  # array([0.05129329, 2.30258509])

In tensorflow, we could accomplish the same with the tf.gather function:

def sparse_categorical_crossentropy(output, target):
  output_i = tf.gather(output, target, axis=1, batch_dims=1)
  return -tf.math.log(output_i)

Jul 04 '22 14:07 lucasdavid

Hi @Suzan009, You can add extra dimension to your y_true value.

y_true = np.array([1, 2])
y_true = np.expand_dims(y_true, axis=-1)
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])

categorical_crossentropy(y_true, y_pred)

Output array([19.16512122, 9.65662747])

Jul 06 '22 02:07 gadagashwini

@gadagashwini but both output should be same, but it gives different output.

y_true = np.array([[0, 1, 0], [0, 0, 1]])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)

y_true = np.array([1, 2])
y_true = np.expand_dims(y_true, axis=-1)
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)

@lucasdavid workaround is working, but I'm trying to understand it a bit. Will close the issue afterwards.

Jul 06 '22 21:07 pure-rgb

@Suzan009, Did you a get chance to look into the issue. Thank you!

Jul 15 '22 08:07 gadagashwini

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

Aug 08 '22 06:08 google-ml-butler[bot]

Closing as stale. Please reopen if you'd like to work on this further.

Aug 15 '22 07:08 google-ml-butler[bot]

Are you satisfied with the resolution of your issue? Yes No

Aug 15 '22 07:08 google-ml-butler[bot]

keras keras copied to clipboard

custom sparse categorical loss

keras
keras copied to clipboard