dpctl
dpctl copied to clipboard
`floor_divide` returns different result for arrays of floating dtype on GPU and CPU devices
In below example the behavior is different between CPU and GPU devices:
import numpy, dpctl, dpctl.tensor as dpt
dpctl.__version__
# Out: '0.17.0dev0+300.g7757857466'
a = dpt.arange(1, 10, dtype='f', device='gpu')
b = dpt.arange(1, 10, dtype='f', device='gpu')
dpt.floor_divide(a, b)
# Out: usm_ndarray([1., 1., 1., 1., 1., 1., 0., 1., 1.], dtype=float32)
a = dpt.arange(1, 10, dtype='f', device='cpu')
b = dpt.arange(1, 10, dtype='f', device='cpu')
dpt.floor_divide(a, b)
# Out: usm_ndarray([1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)
na = numpy.arange(1, 10, dtype='f')
nb = numpy.arange(1, 10, dtype='f')
numpy.floor_divide(na, nb)
# Out: array([1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)
So we have 0
as 7th element of the result array for GPU device and 1
on CPU and in numpy.
If we look into divide
function output:
a = dpt.arange(1, 10, dtype='f')
b = dpt.arange(1, 10, dtype='f')
dpt.divide(a, b)
# Out:
# usm_ndarray([1. , 1. , 1. , 1. , 1. ,
# 1. , 0.99999994, 1. , 1. ], dtype=float32)
there will be the value 0.99999994 < 1.
for GPU device.
Based on the code:
auto div = in1 / in2;
return (div == resT(0)) ? div : resT(sycl::floor(div));
dpctl uses sycl::floor()
function, which is intended to return
The value x rounded to an integral value using the round to negative infinity rounding mode
And I guess this is the reason why 0.99999994
rounds to 0
here.
While in Python array API it states that:
Rounds the result of dividing each element x1_i of the input array x1 by the respective element x2_i of the input array x2 to the greatest (i.e., closest to +infinity) integer-value number that is not greater than the division result.
Thus I wonder if it is expected dpctl behavior or an issue.
The rounding mode is not exactly at fault here. Per array API
Rounds the result of dividing each element x1_i of the input array x1 by the respective element x2_i of the input array x2 to the greatest (i.e., closest to +infinity) integer-value number that is not greater than the division result
In this case, 1.0 > 0.99999994
, so 0.0
is the appropriate result. So the behavior checks out per array API. The surprising result is caused by the division itself being inaccurate, possibly due to lower precision on GPU devices.
@ndgrigorian, now I see, thank you for the clarification. Would it be worst then to have a special handling in the code? something like
if (sycl::fmod(in1, in2) == 0) {
return resT(std::rint(in1/in2);
}