keras
keras copied to clipboard
custom sparse categorical loss
I want to write a custom sparse categorical loss function in numpy or pure tensorflow. It should handle integer target labels and logit or probabilities output. To this end, I have the following:
def softmax(x, axis=-1):
y = np.exp(x - np.max(x, axis, keepdims=True))
return y / np.sum(y, axis, keepdims=True)
def categorical_crossentropy(target, output, from_logits=False):
if from_logits:
output = softmax(output)
else:
output /= output.sum(axis=-1, keepdims=True)
output = np.clip(output, 1e-7, 1 - 1e-7)
return np.sum(target * -np.log(output), axis=-1, keepdims=False)
I can do (target is one hot)
y_true = np.array([[0, 1, 0], [0, 0, 1]])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)
array([0.05129329, 2.30258509])
But I can not do it (target is an integer, desired).
y_true = np.array([1, 2])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)
ValueError: operands could not be broadcast together with shapes (2,) (2,3)
How to achieve this, where a loss function takes integer target and is able to compute with logits as well as probabilities output. I know there is a built-in function (sparse_categorical_crossentropy), but I like to write it in plain numpy or pure tensorflow as a custom loss function.
In the sparse case, we don't have to multiply (target * -np.log(output), as target = 1 for label i and 0 for the remaining one. It's more efficient to simply pick the i-th output:
p = tf.gather(p, y, axis=1, batch_dims=1)
p = -tf.math.log(p)
can you elaborate with full working code? what is p and y in your above code?
p is the predictions or output. y is the labels or target. Starting from your own implementation:
def categorical_crossentropy(target, output, from_logits=False):
...
return np.sum(target * -np.log(output), axis=-1, keepdims=False)
$\text{target}\in[0, 1]$, so all output values are important and might affect the loss function. For the sparse case, however, only one item in target is 1.0, while the remaining ones are 0. This means that, for $n$ classes with $i$ being the true label, sum will be:
$$0\times output_0 + 0\times output_1 + ... + 1\times output_i + ... 0\times output_n = 1\times output_i = output_i$$
So we don't add a bunch of 0s which would not affect the result. Instead, we just pick the i-th output for each sample in the batch:
def sparse_categorical_crossentropy(output, target):
output_i = output[range(len(target)), target]
return -np.log(output_i)
y_true = np.asarray([1, 2])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
sparse_categorical_crossentropy(y_pred, y_true) # array([0.05129329, 2.30258509])
In tensorflow, we could accomplish the same with the tf.gather function:
def sparse_categorical_crossentropy(output, target):
output_i = tf.gather(output, target, axis=1, batch_dims=1)
return -tf.math.log(output_i)
Hi @Suzan009, You can add extra dimension to your y_true value.
y_true = np.array([1, 2])
y_true = np.expand_dims(y_true, axis=-1)
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)
Output
array([19.16512122, 9.65662747])
@gadagashwini but both output should be same, but it gives different output.
y_true = np.array([[0, 1, 0], [0, 0, 1]])
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)
y_true = np.array([1, 2])
y_true = np.expand_dims(y_true, axis=-1)
y_pred = np.array([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
categorical_crossentropy(y_true, y_pred)
@lucasdavid workaround is working, but I'm trying to understand it a bit. Will close the issue afterwards.
@Suzan009, Did you a get chance to look into the issue. Thank you!
This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.
Closing as stale. Please reopen if you'd like to work on this further.