numba-dpex
numba-dpex copied to clipboard
wrong output dtype for math functions
math.ceil and math.floor are supported within a numba kernel and when using it within a CPython interpreter it returns an int.
But within a dpex.kernel it returns a float64, which can be the source of several issues:
- the output of those
mathfunctions cannot be used to index adpex.local.array - it will fail to compile when the gpu do not have
float64support
Here is a toy example to reproduce the issue:
import dpctl
import numpy as np
from numba import float32, uint32
import numba_dppy as dpex
import math
def get_kernel(p, q, r):
shm_shape = (p + 1, q + 1)
@dpex.kernel
def test_kernel(a, b):
threadx = dpex.get_global_id(0)
thready = dpex.get_global_id(1)
some_shared_mem = dpex.local.array(shape=shm_shape, dtype=float32)
x = math.ceil(1 + threadx // r)
y = math.ceil(1 + thready // r)
some_shared_mem[x, y] = float32(3.)
b[threadx, thready] = a[threadx, thready] + float32(3.) + float32(math.inf)
return test_kernel
# Array dimensions
X = 16 * 16
Y = 16
global_size = X, X
griddim = X, X
blockdim = Y, Y
a = np.arange(X * X, dtype=np.float32).reshape(X, X)
b = np.empty_like(a)
# Use the environment variable SYCL_DEVICE_FILTER to change the default device.
# See https://github.com/intel/llvm/blob/sycl/sycl/doc/EnvironmentVariables.md#sycl_device_filter.
device = dpctl.select_default_device()
print("Using device ...")
device.print_device_info()
with dpctl.device_context(device):
get_kernel(4, 4, 4)[griddim, blockdim](a, b)
print("Done...")
It fails both on Intel devcloud or a custom environment with more up to date versions
Minimal reproducer using latest syntax:
def test_minimal():
N = 10
device = dpctl.select_default_device()
@dpx.kernel
def func(a):
i = dpx.get_global_id(0)
i = math.ceil(i + 0.5) - 1
a[i] = a[i] + 1
a = dpnp.ones(N, dtype=dpnp.float32, device=device)
func[dpx.Range(N)](a)
print(a.asnumpy())
@fcharras The issue here is that the math.ceil and math.floor functions are replaced by the SYCL equivalents that only support floating point values. We are looking at a solution where the code generator will add a cast to make the function behave the same way as Python math function.
Just to add the fact that all c/c++ implementations out there do float --> float, double --> double for ceil() and floor(), including SYCL and openCL.
(Sorry for the lack of feedback this week, I took some off time.) Practically speaking, I wouldn't say this issue is too bothersome, it's more a matter of clarity. Python users will refer to python documentation of the math module or will open an interpreter and check the output type and expect it to behave the same in a kernel.
Maybe it's fine to keep SYCL-like behavior, if the documentation gives the differences with the python implementation. Or maybe instead of reusing the math namespace, it could be exposed directly in a numba_dpex.math module ? Users will be less likely (and less right to do so) to assume that dpex.math.floor calls within a kernel should behave like math.floor behaves in cpython.
Hi @fcharras could you please test this branch https://github.com/chudur-budur/numba-dpex/tree/github-759 and verify if they are returning ints? I think this issue is fixed in #960.
For me, #960 indeed fixes the issue.
Just want to add that I realized this mistake I made in the OP: when saying
it returns a
float64
I didn't realize that the output type depends on the input type and thought it would always invoke float64 compute and thus crash gpus that do not have float64 aspect. In fact it seems that math.ceil(x) does anyway work fine on those gpus if x is float32 or int32, so casting both beforehand and afterhand (to int) would have solved this issue.