numba-dpex Call to numba_dpex.kernel corrups data in arguments

Python 3.9.10 (main, Mar  2 2022, 12:02:00)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import dpctl.tensor as dpt, numba_dpex

In [2]: x = dpt.ones(10, dtype='i8')

In [3]: y = dpt.empty_like(x)

In [4]: x.sycl_device == y.sycl_device
Out[4]: True

In [5]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[5]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

In [6]: @numba_dpex.kernel
   ...: def foo(X, Y):
   ...:     i = numba_dpex.get_global_id(0)
   ...:     Y[i] = 2*X[i]
   ...:

In [7]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)

In [8]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[8]:
(array([     94391540026928,                   0, 7954887801964155392,
             94391483338752,      94391483338736,                   0,
                          0,                   0,                   0,
                          0]),
 array([94391548829776,              0,              0,              0,
                     0,              0,              0,              0,
                     0,              0]))

My setup:

(dppy_stack) opavlyk@opavlyk-mobl:~$ conda list dp
# packages in environment at /home/opavlyk/miniconda3/envs/dppy_stack:
#
# Name                    Version                   Build  Channel
dpbench                   0.0.1                     dev_0    <develop>
dpcpp-cpp-rt              2022.2.0             intel_8734    intel
dpcpp_cpp_rt              2022.2.0             intel_8734    intel
dpctl                     0.14.0dev0      py39h8c27c75_195    dppy/label/dev
dpnp                      0.10.2           py39h2bc3f7f_7    dppy/label/dev
mkl-dpcpp                 2022.2.0             intel_8748    intel
numba-dpex                0.18.1          py39hfc4b9b4_45    dppy/label/dev
threadpoolctl             2.2.0              pyh0d69192_0    intel

Oct 21 '22 23:10 oleksandr-pavlyk

I have tested with:

(dpex_809) mingjie2@ansatnuc04:/localdisk/work/mingjie2/sandbox/numba_dpex_809$ conda list dp
# packages in environment at /localdisk/work/mingjie2/miniconda3/envs/dpex_809:
#
# Name                    Version                   Build  Channel
dpcpp-cpp-rt              2022.2.0             intel_8734    intel
dpcpp_cpp_rt              2022.2.0             intel_8734    intel
dpctl                     0.14.0dev0      py39h8c27c75_207    dppy/label/dev
dpnp                      0.10.2           py39h2bc3f7f_8    dppy/label/dev
mkl-dpcpp                 2022.2.0             intel_8748    intel
numba-dpex                0.18.1          py39hfc4b9b4_45    dppy/label/dev

the output seems correct:

(dpex_809) mingjie2@ansatnuc04:/localdisk/work/mingjie2/sandbox/numba_dpex_809$ python test_Sasha.py
in:
[1 1 1 1 1 1 1 1 1 1]
[0 0 0 0 0 0 0 0 0 0]
out:
[1 1 1 1 1 1 1 1 1 1]
[2 2 2 2 2 2 2 2 2 2]

The only difference I can see here is dpctl: py39h8c27c75_207 vs py39h8c27c75_195. Is the possible something fixed in the newer build?

Oct 25 '22 17:10 mingjie-intel

The thing that I failed to state is my GPU environment:

(dppy_stack2) opavlyk@opavlyk-mobl:~/tmp$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.7.0.30_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.0 [2022.14.7.0.30_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x9a49] 3.0 [22.39.24347]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x9a49] 1.3 [1.3.24347]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

I am on Ubuntu 20.04 running kernel "5.10.102.1-microsoft-standard-WSL2" in WSL.

Oct 25 '22 20:10 oleksandr-pavlyk

The thing that I failed to state is my GPU environment:

(dppy_stack2) opavlyk@opavlyk-mobl:~/tmp$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.7.0.30_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.0 [2022.14.7.0.30_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x9a49] 3.0 [22.39.24347]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x9a49] 1.3 [1.3.24347]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

I am on Ubuntu 20.04 running kernel "5.10.102.1-microsoft-standard-WSL2" in WSL.

I suspect an overflow. Does Iris Xe support fp64?

Oct 26 '22 03:10 diptorupd

No, Iris Xe-LP does not support fp64:

In [1]: import dpctl

In [2]: dev = dpctl.SyclDevice()

In [3]: dev, dev.has_aspect_fp64
Out[3]:
(<dpctl.SyclDevice [backend_type.level_zero, device_type.gpu,  Intel(R) Graphics [0x9a49]] at 0x7eff719a21b0>,
 False)

Oct 26 '22 12:10 oleksandr-pavlyk

But, then your dtype is i8. So, the issue may not be related to fp64. Let me investigate. There are two possibilities: the unpack routine where we cast PyObjects to ctypes is messing up the precision for i8 or worse we are somehow ignoring the dtype and passing in fp64 anyway. Let me check and get back.

Oct 26 '22 14:10 diptorupd

I was able to replicate this issue locally.

Using Sasha's code: In

(array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

Out

(array([     94921035162528,                   0, 7954887801964155392,
            94921012152992,      94921012152976,                   0,
                         0,                   0,                   0,
                         0]),
array([94921064734992,              0, 94921064738480, 94921064739312,
       94921035167728, 94921035162256, 94921035162272, 94921035162272,
       94921035168864, 94921035168872]))

My GPU environment is similar:

(numba-dpex-env) ngrigori@ngrigori-mobl:~/numba-dpex$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.7.0.30_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.0 [2022.14.7.0.30_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x9a49] 3.0 [22.28.23726.1]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x9a49] 1.3 [1.3.23726]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]

(numba-dpex-env) ngrigori@ngrigori-mobl:~/numba-dpex$ uname -r | tr '[:upper:]' '[:lower:]'
5.10.16.3-microsoft-standard-wsl2

with Ubuntu 20.04

Oct 28 '22 19:10 ndgrigorian

So this must be a bug specific to WSL. Just tried this on Linux box:

Python 3.9.12 (main, Aug 29 2022, 00:54:58)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import dpctl, numba_dpex, dpctl.tensor as dpt

In [2]: x = dpt.ones(10, dtype='i8')
   ...:

In [3]: y = dpt.empty_like(x)

In [4]: x.sycl_device == y.sycl_device
Out[4]: True

In [5]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[5]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

In [6]: @numba_dpex.kernel
   ...: def foo(X, Y):
   ...:     i = numba_dpex.get_global_id(0)
   ...:     Y[i] = 2 * X[i]
   ...:

In [7]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)

In [8]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[8]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))

In [9]: x.sycl_device
Out[9]: <dpctl.SyclDevice [backend_type.level_zero, device_type.gpu,  Intel(R) Iris(R) Xe Graphics [0x9a49]] at 0x7fc5c4cfd5f0>

In [10]: x.sycl_device.driver_version
Out[10]: '1.3.24347'

In [11]: !uname -r
5.15.0-47-generic

In [12]: !lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.1 LTS
Release:        22.04
Codename:       jammy

Oct 28 '22 21:10 oleksandr-pavlyk

Also, this appears to work correctly in WSL with opencl:gpu backend:

Python 3.9.12 (main, Aug 29 2022, 00:54:58)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import dpctl.tensor as dpt, numba_dpex

In [2]: x = dpt.ones(10, dtype='i8', device="opencl:gpu")

In [3]: y = dpt.empty_like(x)

In [4]: x.sycl_device == y.sycl_device
Out[4]: True

In [5]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[5]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

In [6]: @numba_dpex.kernel
   ...: def foo(X, Y):
   ...:     i = numba_dpex.get_global_id(0)
   ...:     Y[i] = 2*X[i]
   ...:

In [7]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)

In [8]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[8]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))

Nov 04 '22 17:11 oleksandr-pavlyk

@diptorupd I think it is clear that this is not an issue with numba_dpex. The issue can be closed.

Nov 04 '22 17:11 oleksandr-pavlyk

Reopening, I have built a project in C++ which submits the interoperability kernel extracted from this example using both level-zero and opencl backends.

The example works fine in WSL with both backends. Hence, the observed corruption is numba-dpex specific.

The repo can be found at https://github.com/oleksandr-pavlyk/native-reproducer-simulating-numba-dpex-issue-809/

Nov 14 '22 16:11 oleksandr-pavlyk

I updated the reproducer repo with scripts/run.py which submits using dpctl.program and it runs fine on WSL.

Nov 15 '22 12:11 oleksandr-pavlyk

After update of WSL, and Linux kernel and Intel GPU driver that came along side release of oneAPI 2023 the behavior changed:

In [1]: import dpctl, dpctl.tensor as dpt, numba_dpex
/home/opavlyk/miniconda3/envs/dppy_stack2/lib/python3.9/site-packages/llvmlite/llvmpy/__init__.py:3: UserWarning: The module `llvmlite.llvmpy` is deprecated and will be removed in the future.
  warnings.warn(
/home/opavlyk/miniconda3/envs/dppy_stack2/lib/python3.9/site-packages/llvmlite/llvmpy/core.py:8: UserWarning: The module `llvmlite.llvmpy.core` is deprecated and will be removed in the future. Equivalent functionality is provided by `llvmlite.ir`.
  warnings.warn(

In [2]: x = dpt.ones(10, dtype='i8', device="cpu")

In [3]: y = dpt.empty_like(x)

In [4]: @numba_dpex.kernel
   ...: def foo(X, Y):
   ...:     i = numba_dpex.get_global_id(0)
   ...:     Y[i] = 2*X[i]
   ...:

In [5]: x, y
Out[5]:
(usm_ndarray([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 usm_ndarray([         0,          0,          0,          0,          0,
                       0,          0,          0, 4295036673,      69376]))

In [6]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)

In [7]: x, y
Out[7]:
(usm_ndarray([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 usm_ndarray([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))

In [8]: x = dpt.ones(10, dtype='i8', device="gpu")

In [9]: y = dpt.empty_like(x)

In [10]: x, y
Out[10]:
(usm_ndarray([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
 usm_ndarray([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

In [11]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)

In [12]: x, y
Out[12]:
(usm_ndarray([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
 usm_ndarray([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))

(dppy_stack2) opavlyk@opavlyk-mobl:~$ OCL_ICD_FILENAMES=${CONDA_PREFIX}/lib/libintelocl.so ipython -c "import dpctl, numba_dpex; print((dpctl.__version__, numba_dpex.__version__))"
/home/opavlyk/miniconda3/envs/dppy_stack2/lib/python3.9/site-packages/llvmlite/llvmpy/__init__.py:3: UserWarning: The module `llvmlite.llvmpy` is deprecated and will be removed in the future.
  warnings.warn(
/home/opavlyk/miniconda3/envs/dppy_stack2/lib/python3.9/site-packages/llvmlite/llvmpy/core.py:8: UserWarning: The module `llvmlite.llvmpy.core` is deprecated and will be removed in the future. Equivalent functionality is provided by `llvmlite.ir`.
  warnings.warn(
('0.14.1dev1+11.g47e4ae45f', '0.19.0+21.ga541db59')

but the result is not correct

Jan 10 '23 20:01 oleksandr-pavlyk

numba-dpex numba-dpex copied to clipboard

Call to numba_dpex.kernel corrups data in arguments

numba-dpex
numba-dpex copied to clipboard