TensorFlow-Examples [Potential NAN bug] Loss may become NAN during training

[Potential NAN bug] Loss may become NAN during training

Open Justobe opened this issue 4 years ago • 1 comments

Hello~

Thank you very much for sharing the code!

I try to use my own data set ( with the same shape as mnist) in code. After some iterations, it is found that the training loss becomes NAN. After carefully checking the code, I found that the following code may trigger NAN in loss:

In TensorFlow-Examples/examples/2_BasicModels/logistic_regression.py:

cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))

If pred contains 0 (output of softmax ), the result of tf.log(pred) is inf because log(0) is illegal . And this may cause the result of loss to become NAN.

It could be fixed by making the following changes:

cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred + 1e-10), reduction_indices=1))

cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred,1e-10,1.0)), reduction_indices=1))

Hope to hear from you ~

Thanks in advance! : )

Jul 27 '20 14:07 Justobe

@aymericdamien

Jan 13 '21 03:01 Justobe

TensorFlow-Examples TensorFlow-Examples copied to clipboard

[Potential NAN bug] Loss may become NAN during training

TensorFlow-Examples
TensorFlow-Examples copied to clipboard