tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

The gradient of tf.math.digamma is NaN on GPU and 0.0 on CPU when the input is inf

Open mazeltovlee opened this issue 2 years ago • 3 comments

Click to expand!

Issue Type

Bug

Have you reproduced the bug with TF nightly?

Yes

Source

source

Tensorflow Version

2.11.0

Custom Code

Yes

OS Platform and Distribution

No response

Mobile device

No response

Python version

3.8

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

No response

GPU model and memory

No response

Current Behaviour?

The gradient of tf.math.digamma is NaN when receiving inf as input. This issue only happens on GPU mode, when I run the program on CPU, the gradient is 0.

Standalone code to reproduce the issue

import tensorflow as tf
import numpy as np


x = tf.Variable(np.inf, dtype="float32")

with tf.GradientTape() as g0:
    g0.watch(x)
    res = tf.math.digamma(x)
tf_grad = g0.gradient(res, x)
print("The gradient of digamma is: ", tf_grad)  # The gradient of digamma is:  tf.Tensor(nan, shape=(), dtype=float32)

Relevant log output

The gradient of digamma is:  tf.Tensor(nan, shape=(), dtype=float32)

mazeltovlee avatar Feb 10 '23 16:02 mazeltovlee

@mazeltovlee I was able to replicate the issue on Google Colab with TF v2.11 on both CPU(output is 0.0) and GPU(output is nan). Please find the gist for the same here-CPU and here-GPU. It seems like we have to dig more into this issue, we will update soon here. Thank you!

synandi avatar Feb 15 '23 08:02 synandi

Hi @mazeltovlee, Apologies for the delay. On a GPU, floating point operations are performed using a special hardware that is optimized for parallel computation. This hardware may use different algorithms or have different rounding behavior compared to a CPU, which can lead to different results for the same calculation.

In your case of tf.math.digamma(x) where x is equal to inf, the output is undefined mathematically. When x is set to inf, the result of the function is undefined, and the gradient of the function is also undefined. However, due to rounding errors and implementation differences, it may be evaluated as NaN on a GPU and as 0 on a CPU. Thank you!

synandi avatar Feb 17 '23 13:02 synandi

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler[bot] avatar Feb 24 '23 14:02 google-ml-butler[bot]

Closing as stale. Please reopen if you'd like to work on this further.

synandi avatar Mar 24 '23 04:03 synandi

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar Mar 24 '23 04:03 google-ml-butler[bot]