The results of tf.image.convert_image_dtype running on CPU and GPU are very different.
Click to expand!
Issue Type
Bug
Source
source
Tensorflow Version
TF 2.11
Custom Code
Yes
OS Platform and Distribution
No response
Mobile device
No response
Python version
No response
Bazel version
No response
GCC/Compiler version
No response
CUDA/cuDNN version
CUDA: 11.2 cuDNN 8.1
GPU model and memory
No response
Current Behaviour?
The results of tf.image.convert_image_dtype running on CPU and GPU are very different.
Standalone code to reproduce the issue
CPU code:
import tensorflow as tf
with tf.device('/CPU'):
arg_0 = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]]
out = tf.image.convert_image_dtype(arg_0, dtype=tf.uint32, saturate=-1)
print(out)
GPU code:
import tensorflow as tf
with tf.device('/GPU:0'):
arg_0 = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]]
out = tf.image.convert_image_dtype(arg_0, dtype=tf.uint32, saturate=-1)
print(out)
Relevant log output
CPU result: tf.Tensor(
[[[0 0 0]
[0 0 0]]
[[0 0 0]
[0 0 0]]], shape=(2, 2, 3), dtype=uint32)
GPU result: tf.Tensor(
[[[2147483647 2147483647 2147483647]
[2147483647 2147483647 2147483647]]
[[2147483647 2147483647 2147483647]
[2147483647 2147483647 2147483647]]], shape=(2, 2, 3), dtype=uint32)
Hi @triumph-wangyuyang ,
There is this condition mentioned in API.
Images that are represented using floating point values are expected to have values in the range [0,1).
Hence there is inconsistency in the result. I have tried values within [0,1) and result same on both CPU & GPU.Please refer to attached gist.
Please check and close the issue if your query got resolved.
Thankyou!
Hi @triumph-wangyuyang ,
There is this condition mentioned in API.
Images that are represented using floating point values are expected to have values in the range [0,1).Hence there is inconsistency in the result. I have tried values within [0,1) and result same on both CPU & GPU.Please refer to attached gist.
Please check and close the issue if your query got resolved.
Thankyou!
I am doing the tensorflow operator test, and then deliberately use illegal parameters to test the operator. In this test, I did not use [0,1), but use >=1 value, and then in this way on the CPU and GPU The above results are different. Can we make a preliminary judgment on the value of the Images parameter at the operator entry, and if it is not in [0,1), an exception will be thrown to prevent the program from continuing.
@triumph-wangyuyang ,
I agree to that.It is better to raise exception/warning regarding invalid inputs rather than continuing and generating inconsistent results.Lets see if i can do something on this.
Thankyou!
Hi @triumph-wangyuyang ,
The above mention PR should address the issue.
We make no guarantees that CPU and GPU results are identical, especially for garbage data. The input doesn't crash, so it's not a security issue. Error checking is expensive.
The GPU result is flushing all results to the max value (essentially saturating the input). We could potentially do the same on CPU. I wouldn't say it's a requirement though.
Just noticed that saturate was set to True (indirectly via the -1 value), so this should actually have defined behavior and there is an issue with saturation. Will dig into it.
The issue here is that uint32.max is not actually representable in float32 - and rounds up when converting, from 4294967295 to 4294967300.0. This eventually leads to a cast overflow and undefined behavior - which is why we see different values between CPU and GPU.
Hi @triumph-wangyuyang ,
Please refer to attached explanation in above comment.The cast overflow causing undefined behaviour and hence getting different results. This has been fixed with tf-nightly(2.14.0-dev20230503). Please refer to attached gist which showing both CPU and GPU are now producing same results.
Thanks!
I also observed the following API aliases can cause the same issue in older versions of tensorflow. Users should be cautious when using them on the CPU up to tensorflow 2.12.0 (v2.12.0-rc1-12-g0db597d0d75).
-
(tf.image.convert_image_dtype),tf.compat.v1.image.convert_image_dtype
Code to reproduce the issue in tf.compat.v1.image.convert_image_dtype in older versions
import tensorflow as tf
print(tf.version.GIT_VERSION, tf.version.VERSION, flush=True)
print(tf.config.list_physical_devices(), flush=True)
arg_0 = [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]], [[7.0, 8.0, 9.0], [10.0, 11.0, 12.0]]]
x1 = tf.compat.v1.image.convert_image_dtype(arg_0, dtype=tf.uint32, saturate=-1).numpy()
print(x1)
On CPU, it outputs the following results:
v2.12.0-rc1-12-g0db597d0d75 2.12.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU')]
[[[0 0 0]
[0 0 0]]
[[0 0 0]
[0 0 0]]]
While on GPU, the results are as follows, which are inconsistent with the CPU:
v2.12.0-rc1-12-g0db597d0d75 2.12.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]
[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]]
It seems to be fixed in tensorflow 2.13.0 (v2.13.0-rc2-7-g1cb1a030a62) and later versions.
Besides, I also found that the outputs are not consistent across different versions, which should be noted when using them across different versions.
Show outputs of the inconsistent behavior across different versions
v2.9.0-rc2-42-g8a20d54a3c1 2.9.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[2147483647 2147483647 2147483647]
[2147483647 2147483647 2147483647]]
[[2147483647 2147483647 2147483647]
[2147483647 2147483647 2147483647]]]
v2.9.2-107-ga5ed5f39b67 2.9.3
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[2147483647 2147483647 2147483647]
[2147483647 2147483647 2147483647]]
[[2147483647 2147483647 2147483647]
[2147483647 2147483647 2147483647]]]
v2.10.0-rc3-6-g359c3cdfc5f 2.10.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]
[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]]
v2.10.0-76-gfdfc646704c 2.10.1
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]
[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]]
v2.11.0-rc2-17-gd5b57ca93e5 2.11.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]
[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]]
v2.11.0-94-ga3e2c692c18 2.11.1
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]
[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]]
v2.12.0-rc1-12-g0db597d0d75 2.12.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]
[[4294967295 4294967295 4294967295]
[4294967295 4294967295 4294967295]]]
v2.13.0-rc2-7-g1cb1a030a62 2.13.0
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967040 4294967040 4294967040]
[4294967040 4294967040 4294967040]]
[[4294967040 4294967040 4294967040]
[4294967040 4294967040 4294967040]]]
v2.14.0-rc0-34-gdd01672d9a9 2.14.0-rc1
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967040 4294967040 4294967040]
[4294967040 4294967040 4294967040]]
[[4294967040 4294967040 4294967040]
[4294967040 4294967040 4294967040]]]
v1.12.1-99436-g5e7d6faebab 2.15.0-dev20230904
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
[[[4294967040 4294967040 4294967040]
[4294967040 4294967040 4294967040]]
[[4294967040 4294967040 4294967040]
[4294967040 4294967040 4294967040]]]