keras icon indicating copy to clipboard operation
keras copied to clipboard

custom sparse categorical loss

Open pure-rgb opened this issue 3 years ago • 7 comments

I want to write a custom sparse categorical loss function in numpy or pure tensorflow. It should handle integer target labels and logit or probabilities output. To this end, I have the following:


def softmax(x, axis=-1):
    y = np.exp(x - np.max(x, axis, keepdims=True))
    return y / np.sum(y, axis, keepdims=True)

def categorical_crossentropy(target, output, from_logits=False):
    if from_logits:
        output = softmax(output)
    else:
        output /= output.sum(axis=-1, keepdims=True)
    output = np.clip(output, 1e-7, 1 - 1e-7)
    return np.sum(target * -np.log(output), axis=-1, keepdims=False)

I can do (target is one hot)


y_true = np.array([[0, 1, 0], [0, 0, 1]])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])

categorical_crossentropy(y_true, y_pred)
array([0.05129329, 2.30258509])

But I can not do it (target is an integer, desired).


y_true = np.array([1, 2])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])

categorical_crossentropy(y_true, y_pred)
ValueError: operands could not be broadcast together with shapes (2,) (2,3) 

How to achieve this, where a loss function takes integer target and is able to compute with logits as well as probabilities output. I know there is a built-in function (sparse_categorical_crossentropy), but I like to write it in plain numpy or pure tensorflow as a custom loss function.

pure-rgb avatar Jun 30 '22 10:06 pure-rgb

In the sparse case, we don't have to multiply (target * -np.log(output), as target = 1 for label i and 0 for the remaining one. It's more efficient to simply pick the i-th output:

p = tf.gather(p, y, axis=1, batch_dims=1)
p = -tf.math.log(p)

lucasdavid avatar Jun 30 '22 14:06 lucasdavid

can you elaborate with full working code? what is p and y in your above code?

pure-rgb avatar Jul 03 '22 18:07 pure-rgb

p is the predictions or output. y is the labels or target. Starting from your own implementation:

def categorical_crossentropy(target, output, from_logits=False):
    ...
    return np.sum(target * -np.log(output), axis=-1, keepdims=False)

$\text{target}\in[0, 1]$, so all output values are important and might affect the loss function. For the sparse case, however, only one item in target is 1.0, while the remaining ones are 0. This means that, for $n$ classes with $i$ being the true label, sum will be:

$$0\times output_0 + 0\times output_1 + ... + 1\times output_i + ... 0\times output_n = 1\times output_i = output_i$$

So we don't add a bunch of 0s which would not affect the result. Instead, we just pick the i-th output for each sample in the batch:

def sparse_categorical_crossentropy(output, target):
  output_i = output[range(len(target)), target]
  return -np.log(output_i)

y_true = np.asarray([1, 2])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])

sparse_categorical_crossentropy(y_pred, y_true)  # array([0.05129329, 2.30258509])

In tensorflow, we could accomplish the same with the tf.gather function:

def sparse_categorical_crossentropy(output, target):
  output_i = tf.gather(output, target, axis=1, batch_dims=1)
  return -tf.math.log(output_i)

lucasdavid avatar Jul 04 '22 14:07 lucasdavid

Hi @Suzan009, You can add extra dimension to your y_true value.

y_true = np.array([1, 2])
y_true = np.expand_dims(y_true, axis=-1)
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])

categorical_crossentropy(y_true, y_pred)

Output array([19.16512122, 9.65662747])

gadagashwini avatar Jul 06 '22 02:07 gadagashwini

@gadagashwini but both output should be same, but it gives different output.

y_true = np.array([[0, 1, 0], [0, 0, 1]])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)

y_true = np.array([1, 2])
y_true = np.expand_dims(y_true, axis=-1)
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)

@lucasdavid workaround is working, but I'm trying to understand it a bit. Will close the issue afterwards.

pure-rgb avatar Jul 06 '22 21:07 pure-rgb

@Suzan009, Did you a get chance to look into the issue. Thank you!

gadagashwini avatar Jul 15 '22 08:07 gadagashwini

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Aug 08 '22 06:08 google-ml-butler[bot]

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler[bot] avatar Aug 15 '22 07:08 google-ml-butler[bot]

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Aug 15 '22 07:08 google-ml-butler[bot]