tensorflow icon indicating copy to clipboard operation
tensorflow copied to clipboard

The results of tf.image.convert_image_dtype running on CPU and GPU are very different.

Open triumph-wangyuyang opened this issue 3 years ago • 6 comments

Click to expand!

Issue Type

Bug

Source

source

Tensorflow Version

TF 2.11

Custom Code

Yes

OS Platform and Distribution

No response

Mobile device

No response

Python version

No response

Bazel version

No response

GCC/Compiler version

No response

CUDA/cuDNN version

CUDA: 11.2 cuDNN 8.1

GPU model and memory

No response

Current Behaviour?

The results of tf.image.convert_image_dtype running on CPU and GPU are very different.

Standalone code to reproduce the issue

CPU code:

    import tensorflow as tf
    with tf.device('/CPU'):
        arg_0 = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]]
        out = tf.image.convert_image_dtype(arg_0, dtype=tf.uint32, saturate=-1)
    print(out)

GPU code:

    import tensorflow as tf
    with tf.device('/GPU:0'):
        arg_0 = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]]
        out = tf.image.convert_image_dtype(arg_0, dtype=tf.uint32, saturate=-1)
    print(out)

Relevant log output

CPU result: tf.Tensor(
[[[0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]]], shape=(2, 2, 3), dtype=uint32)


GPU result: tf.Tensor(
[[[2147483647 2147483647 2147483647]
  [2147483647 2147483647 2147483647]]

 [[2147483647 2147483647 2147483647]
  [2147483647 2147483647 2147483647]]], shape=(2, 2, 3), dtype=uint32)

triumph-wangyuyang avatar Dec 01 '22 12:12 triumph-wangyuyang

Hi @triumph-wangyuyang ,

There is this condition mentioned in API.

Images that are represented using floating point values are expected to have values in the range [0,1).

Hence there is inconsistency in the result. I have tried values within [0,1) and result same on both CPU & GPU.Please refer to attached gist.

Please check and close the issue if your query got resolved.

Thankyou!

SuryanarayanaY avatar Dec 02 '22 06:12 SuryanarayanaY

Hi @triumph-wangyuyang ,

There is this condition mentioned in API.

Images that are represented using floating point values are expected to have values in the range [0,1).

Hence there is inconsistency in the result. I have tried values within [0,1) and result same on both CPU & GPU.Please refer to attached gist.

Please check and close the issue if your query got resolved.

Thankyou!

I am doing the tensorflow operator test, and then deliberately use illegal parameters to test the operator. In this test, I did not use [0,1), but use >=1 value, and then in this way on the CPU and GPU The above results are different. Can we make a preliminary judgment on the value of the Images parameter at the operator entry, and if it is not in [0,1), an exception will be thrown to prevent the program from continuing.

triumph-wangyuyang avatar Dec 02 '22 06:12 triumph-wangyuyang

@triumph-wangyuyang ,

I agree to that.It is better to raise exception/warning regarding invalid inputs rather than continuing and generating inconsistent results.Lets see if i can do something on this.

Thankyou!

SuryanarayanaY avatar Dec 02 '22 07:12 SuryanarayanaY

Hi @triumph-wangyuyang ,

The above mention PR should address the issue.

SuryanarayanaY avatar Feb 21 '23 10:02 SuryanarayanaY

We make no guarantees that CPU and GPU results are identical, especially for garbage data. The input doesn't crash, so it's not a security issue. Error checking is expensive.

The GPU result is flushing all results to the max value (essentially saturating the input). We could potentially do the same on CPU. I wouldn't say it's a requirement though.

cantonios avatar Feb 21 '23 17:02 cantonios

Just noticed that saturate was set to True (indirectly via the -1 value), so this should actually have defined behavior and there is an issue with saturation. Will dig into it.

cantonios avatar Feb 24 '23 17:02 cantonios

The issue here is that uint32.max is not actually representable in float32 - and rounds up when converting, from 4294967295 to 4294967300.0. This eventually leads to a cast overflow and undefined behavior - which is why we see different values between CPU and GPU.

cantonios avatar Mar 13 '23 22:03 cantonios

Hi @triumph-wangyuyang ,

Please refer to attached explanation in above comment.The cast overflow causing undefined behaviour and hence getting different results. This has been fixed with tf-nightly(2.14.0-dev20230503). Please refer to attached gist which showing both CPU and GPU are now producing same results.

Thanks!

SuryanarayanaY avatar May 04 '23 05:05 SuryanarayanaY

Are you satisfied with the resolution of your issue? Yes No

google-ml-butler[bot] avatar May 04 '23 15:05 google-ml-butler[bot]

I also observed the following API aliases can cause the same issue in older versions of tensorflow. Users should be cautious when using them on the CPU up to tensorflow 2.12.0 (v2.12.0-rc1-12-g0db597d0d75).

  • (tf.image.convert_image_dtype), tf.compat.v1.image.convert_image_dtype
Code to reproduce the issue in tf.compat.v1.image.convert_image_dtype in older versions
import tensorflow as tf
print(tf.version.GIT_VERSION, tf.version.VERSION, flush=True)
print(tf.config.list_physical_devices(), flush=True)


arg_0 = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]]
x1 = tf.compat.v1.image.convert_image_dtype(arg_0, dtype=tf.uint32, saturate=-1).numpy()
print(x1)

On CPU, it outputs the following results:

v2.12.0-rc1-12-g0db597d0d75 2.12.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
[[[0 0 0]
  [0 0 0]]

 [[0 0 0]
  [0 0 0]]]

While on GPU, the results are as follows, which are inconsistent with the CPU:

v2.12.0-rc1-12-g0db597d0d75 2.12.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]

 [[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]]

It seems to be fixed in tensorflow 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) and later versions.

Besides, I also found that the outputs are not consistent across different versions, which should be noted when using them across different versions.

Show outputs of the inconsistent behavior across different versions
v2.9.0-rc2-42-g8a20d54a3c1 2.9.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[2147483647 2147483647 2147483647]
  [2147483647 2147483647 2147483647]]

 [[2147483647 2147483647 2147483647]
  [2147483647 2147483647 2147483647]]]

v2.9.2-107-ga5ed5f39b67 2.9.3
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[2147483647 2147483647 2147483647]
  [2147483647 2147483647 2147483647]]

 [[2147483647 2147483647 2147483647]
  [2147483647 2147483647 2147483647]]]

v2.10.0-rc3-6-g359c3cdfc5f 2.10.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]

 [[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]]

v2.10.0-76-gfdfc646704c 2.10.1
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]

 [[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]]

v2.11.0-rc2-17-gd5b57ca93e5 2.11.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]

 [[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]]

v2.11.0-94-ga3e2c692c18 2.11.1
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]

 [[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]]

v2.12.0-rc1-12-g0db597d0d75 2.12.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]

 [[4294967295 4294967295 4294967295]
  [4294967295 4294967295 4294967295]]]

v2.13.0-rc2-7-g1cb1a030a62 2.13.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967040 4294967040 4294967040]
  [4294967040 4294967040 4294967040]]

 [[4294967040 4294967040 4294967040]
  [4294967040 4294967040 4294967040]]]

v2.14.0-rc0-34-gdd01672d9a9 2.14.0-rc1
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967040 4294967040 4294967040]
  [4294967040 4294967040 4294967040]]

 [[4294967040 4294967040 4294967040]
  [4294967040 4294967040 4294967040]]]

v1.12.1-99436-g5e7d6faebab 2.15.0-dev20230904
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967040 4294967040 4294967040]
  [4294967040 4294967040 4294967040]]

 [[4294967040 4294967040 4294967040]
  [4294967040 4294967040 4294967040]]]

oawxkw avatar Sep 12 '23 09:09 oawxkw