numba-dpex
numba-dpex copied to clipboard
Implement a kernel target for devices that do not support float64 dtype
Currently, numba-dpex uses the same target context to generate kernels for all types of SPIR-V supporting devices. One of the limitations of the design is that there is no easy way to limit the supported signatures for math functions (and probably built-in operators) on devices that do not support some types of floating point widths. It can lead to kernel compilation errors such as the one described in #960:
math.ceil
and math.floor
do not to work with devices that do not support float64
compute (such as igpus), and this PR does not solve that point.
Even if casting everything float32
beforehand, e.g:
import dpctl.tensor as dpt
import dpnp
import numba_dpex as dpx
import math
N = 10
cst = dpnp.float32(0.5)
@dpx.kernel
def func(a):
i = dpx.get_global_id(0)
i = math.ceil(dpnp.float32(i) + cst)
a[i] = a[i] + i
a = dpt.ones(N, dtype=dpnp.float32)
func[dpx.Range(N)](a)
print(dpt.asnumpy(a))
gives a SyclProgramCompilationError
: which is the error we're used to for float64
kernels on float32
-only hardware.
The problem in the above examples seems to be that built-in add
operator is upcasting to float64
. A solution may be to split up DpexKernelTargetContext
into two separate contexts: DpexFP32KernelTargetContext
and DpexFP64KernelTargetContext
.
These two separate contexts can then have two separate registries for math functions and built-in operators that register the different architecture aware flavours of functions. At the point of compiling we can pick the right context based on what device is targeted.