keras icon indicating copy to clipboard operation
keras copied to clipboard

Categorical Cross Entropy normalization issue ?

Open PierrickPochelu opened this issue 3 years ago • 10 comments

In categorical_crossentropy, I suspect this normalization line to be not useful and leads to 2 unexpected behaviors https://github.com/keras-team/keras/blob/b80dd12da9c0bc3f569eca3455e77762cf2ee8ef/keras/backend.py#L5540

The upper block handle logits and returns something if it is a logit. Therefore, L5540 is called when it is NOT a logit (sum of proba=1). AFAIK the values are already normalized here so L5540 is not useful.

Let’s consider the normalization L5540 is useful and we kept it unchanged. It produces an error in those two edge cases:

  • Unexpected behavior 1 when predictions are zeroed it produces a division by 0: out=categorical_crossentropy(output=np.array([[0.,0.,0.]]), target=np.array([[1.,0.,0.]]), from_logits=False, axis=-1) returns tf.Tensor([nan], shape=(1,), dtype=float64)

  • Unexpected behavior 2 is #17029 . In the edge case where we have 1 single class. Removing this L5540 allows training the model to return always 1. Therefore, the loss will not be always 0 but will quickly converge to 0. Line L5540 seems the root cause of what is described in this previous issue.

  • As discussed in the 17029 issues, returning a warning to the user may be also useful.

Reproductibility https://colab.research.google.com/drive/1GFXQxfw_4gwMU6vIo8fNDPGCwnbOSp2o?usp=sharing

I may contribute and push a PR

PierrickPochelu avatar Sep 30 '22 14:09 PierrickPochelu

tf.math.divide_no_nan would fix the first edge case. However, assuming a correct usage of the API, I'm not sure it would ever happen in the first place: it's impossible for the softmax function to produce an output [0., 0., 0.] because $e^x > 0, \forall x\in \mathbb{R}$.

lucasdavid avatar Oct 03 '22 13:10 lucasdavid

@PierrickPochelu Could you refer to the comment above and let us know if it helps? Thank you!

sushreebarsa avatar Oct 04 '22 07:10 sushreebarsa

yes it is

PierrickPochelu avatar Oct 04 '22 07:10 PierrickPochelu

@PierrickPochelu Thank you for the confirmation!

sushreebarsa avatar Oct 04 '22 07:10 sushreebarsa

@PierrickPochelu Could you please confirm if this issue is resolved? Thank you!

sushreebarsa avatar Oct 06 '22 12:10 sushreebarsa

reduce_sum should be called only when it is mandatory.

By removing this line I reduce the computing time from 16 sec to 12 sec without changing the output: https://colab.research.google.com/drive/1GFXQxfw_4gwMU6vIo8fNDPGCwnbOSp2o?usp=sharing

To make the benchmark I take an example similar to Imagenet dimensions: 1 million samples (1e5 called 10 times due to memory limit) and 1 thousand classes.

PierrickPochelu avatar Oct 07 '22 12:10 PierrickPochelu

PR: https://github.com/keras-team/keras/pull/17140

PierrickPochelu avatar Oct 13 '22 18:10 PierrickPochelu

reduce_sum should be called only when it is mandatory.

By removing this line I reduce the computing time from 16 sec to 12 sec without changing the output: https://colab.research.google.com/drive/1GFXQxfw_4gwMU6vIo8fNDPGCwnbOSp2o?usp=sharing

To make the benchmark I take an example similar to Imagenet dimensions: 1 million samples (1e5 called 10 times due to memory limit) and 1 thousand classes.

can you please give public access to the colab?

divyashreepathihalli avatar Oct 13 '22 22:10 divyashreepathihalli

Done

PierrickPochelu avatar Oct 14 '22 07:10 PierrickPochelu

@PierrickPochelu - Th PR has been reviewed - https://github.com/keras-team/keras/pull/17140

divyashreepathihalli avatar Oct 18 '22 21:10 divyashreepathihalli