numba-dpex
numba-dpex copied to clipboard
Call to numba_dpex.kernel corrups data in arguments
Python 3.9.10 (main, Mar 2 2022, 12:02:00)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import dpctl.tensor as dpt, numba_dpex
In [2]: x = dpt.ones(10, dtype='i8')
In [3]: y = dpt.empty_like(x)
In [4]: x.sycl_device == y.sycl_device
Out[4]: True
In [5]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[5]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))
In [6]: @numba_dpex.kernel
...: def foo(X, Y):
...: i = numba_dpex.get_global_id(0)
...: Y[i] = 2*X[i]
...:
In [7]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)
In [8]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[8]:
(array([ 94391540026928, 0, 7954887801964155392,
94391483338752, 94391483338736, 0,
0, 0, 0,
0]),
array([94391548829776, 0, 0, 0,
0, 0, 0, 0,
0, 0]))
My setup:
(dppy_stack) opavlyk@opavlyk-mobl:~$ conda list dp
# packages in environment at /home/opavlyk/miniconda3/envs/dppy_stack:
#
# Name Version Build Channel
dpbench 0.0.1 dev_0 <develop>
dpcpp-cpp-rt 2022.2.0 intel_8734 intel
dpcpp_cpp_rt 2022.2.0 intel_8734 intel
dpctl 0.14.0dev0 py39h8c27c75_195 dppy/label/dev
dpnp 0.10.2 py39h2bc3f7f_7 dppy/label/dev
mkl-dpcpp 2022.2.0 intel_8748 intel
numba-dpex 0.18.1 py39hfc4b9b4_45 dppy/label/dev
threadpoolctl 2.2.0 pyh0d69192_0 intel
I have tested with:
(dpex_809) mingjie2@ansatnuc04:/localdisk/work/mingjie2/sandbox/numba_dpex_809$ conda list dp
# packages in environment at /localdisk/work/mingjie2/miniconda3/envs/dpex_809:
#
# Name Version Build Channel
dpcpp-cpp-rt 2022.2.0 intel_8734 intel
dpcpp_cpp_rt 2022.2.0 intel_8734 intel
dpctl 0.14.0dev0 py39h8c27c75_207 dppy/label/dev
dpnp 0.10.2 py39h2bc3f7f_8 dppy/label/dev
mkl-dpcpp 2022.2.0 intel_8748 intel
numba-dpex 0.18.1 py39hfc4b9b4_45 dppy/label/dev
the output seems correct:
(dpex_809) mingjie2@ansatnuc04:/localdisk/work/mingjie2/sandbox/numba_dpex_809$ python test_Sasha.py
in:
[1 1 1 1 1 1 1 1 1 1]
[0 0 0 0 0 0 0 0 0 0]
out:
[1 1 1 1 1 1 1 1 1 1]
[2 2 2 2 2 2 2 2 2 2]
The only difference I can see here is dpctl: py39h8c27c75_207 vs py39h8c27c75_195. Is the possible something fixed in the newer build?
The thing that I failed to state is my GPU environment:
(dppy_stack2) opavlyk@opavlyk-mobl:~/tmp$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.7.0.30_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.0 [2022.14.7.0.30_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x9a49] 3.0 [22.39.24347]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x9a49] 1.3 [1.3.24347]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
I am on Ubuntu 20.04 running kernel "5.10.102.1-microsoft-standard-WSL2" in WSL.
The thing that I failed to state is my GPU environment:
(dppy_stack2) opavlyk@opavlyk-mobl:~/tmp$ sycl-ls [opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.7.0.30_160000] [opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.0 [2022.14.7.0.30_160000] [opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x9a49] 3.0 [22.39.24347] [ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x9a49] 1.3 [1.3.24347] [host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]I am on Ubuntu 20.04 running kernel "5.10.102.1-microsoft-standard-WSL2" in WSL.
I suspect an overflow. Does Iris Xe support fp64?
No, Iris Xe-LP does not support fp64:
In [1]: import dpctl
In [2]: dev = dpctl.SyclDevice()
In [3]: dev, dev.has_aspect_fp64
Out[3]:
(<dpctl.SyclDevice [backend_type.level_zero, device_type.gpu, Intel(R) Graphics [0x9a49]] at 0x7eff719a21b0>,
False)
But, then your dtype is i8. So, the issue may not be related to fp64. Let me investigate. There are two possibilities: the unpack routine where we cast PyObjects to ctypes is messing up the precision for i8 or worse we are somehow ignoring the dtype and passing in fp64 anyway. Let me check and get back.
I was able to replicate this issue locally.
Using Sasha's code: In
(array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))
Out
(array([ 94921035162528, 0, 7954887801964155392,
94921012152992, 94921012152976, 0,
0, 0, 0,
0]),
array([94921064734992, 0, 94921064738480, 94921064739312,
94921035167728, 94921035162256, 94921035162272, 94921035162272,
94921035168864, 94921035168872]))
My GPU environment is similar:
(numba-dpex-env) ngrigori@ngrigori-mobl:~/numba-dpex$ sycl-ls
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2022.14.7.0.30_160000]
[opencl:cpu:1] Intel(R) OpenCL, 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz 3.0 [2022.14.7.0.30_160000]
[opencl:gpu:2] Intel(R) OpenCL HD Graphics, Intel(R) Graphics [0x9a49] 3.0 [22.28.23726.1]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Graphics [0x9a49] 1.3 [1.3.23726]
[host:host:0] SYCL host platform, SYCL host device 1.2 [1.2]
(numba-dpex-env) ngrigori@ngrigori-mobl:~/numba-dpex$ uname -r | tr '[:upper:]' '[:lower:]'
5.10.16.3-microsoft-standard-wsl2
with Ubuntu 20.04
So this must be a bug specific to WSL. Just tried this on Linux box:
Python 3.9.12 (main, Aug 29 2022, 00:54:58)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import dpctl, numba_dpex, dpctl.tensor as dpt
In [2]: x = dpt.ones(10, dtype='i8')
...:
In [3]: y = dpt.empty_like(x)
In [4]: x.sycl_device == y.sycl_device
Out[4]: True
In [5]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[5]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))
In [6]: @numba_dpex.kernel
...: def foo(X, Y):
...: i = numba_dpex.get_global_id(0)
...: Y[i] = 2 * X[i]
...:
In [7]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)
In [8]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[8]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))
In [9]: x.sycl_device
Out[9]: <dpctl.SyclDevice [backend_type.level_zero, device_type.gpu, Intel(R) Iris(R) Xe Graphics [0x9a49]] at 0x7fc5c4cfd5f0>
In [10]: x.sycl_device.driver_version
Out[10]: '1.3.24347'
In [11]: !uname -r
5.15.0-47-generic
In [12]: !lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy
Also, this appears to work correctly in WSL with opencl:gpu backend:
Python 3.9.12 (main, Aug 29 2022, 00:54:58)
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]: import dpctl.tensor as dpt, numba_dpex
In [2]: x = dpt.ones(10, dtype='i8', device="opencl:gpu")
In [3]: y = dpt.empty_like(x)
In [4]: x.sycl_device == y.sycl_device
Out[4]: True
In [5]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[5]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))
In [6]: @numba_dpex.kernel
...: def foo(X, Y):
...: i = numba_dpex.get_global_id(0)
...: Y[i] = 2*X[i]
...:
In [7]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)
In [8]: dpt.asnumpy(x), dpt.asnumpy(y)
Out[8]: (array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]), array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))
@diptorupd I think it is clear that this is not an issue with numba_dpex. The issue can be closed.
Reopening, I have built a project in C++ which submits the interoperability kernel extracted from this example using both level-zero and opencl backends.
The example works fine in WSL with both backends. Hence, the observed corruption is numba-dpex specific.
The repo can be found at https://github.com/oleksandr-pavlyk/native-reproducer-simulating-numba-dpex-issue-809/
I updated the reproducer repo with scripts/run.py which submits using dpctl.program and it runs fine on WSL.
After update of WSL, and Linux kernel and Intel GPU driver that came along side release of oneAPI 2023 the behavior changed:
In [1]: import dpctl, dpctl.tensor as dpt, numba_dpex
/home/opavlyk/miniconda3/envs/dppy_stack2/lib/python3.9/site-packages/llvmlite/llvmpy/__init__.py:3: UserWarning: The module `llvmlite.llvmpy` is deprecated and will be removed in the future.
warnings.warn(
/home/opavlyk/miniconda3/envs/dppy_stack2/lib/python3.9/site-packages/llvmlite/llvmpy/core.py:8: UserWarning: The module `llvmlite.llvmpy.core` is deprecated and will be removed in the future. Equivalent functionality is provided by `llvmlite.ir`.
warnings.warn(
In [2]: x = dpt.ones(10, dtype='i8', device="cpu")
In [3]: y = dpt.empty_like(x)
In [4]: @numba_dpex.kernel
...: def foo(X, Y):
...: i = numba_dpex.get_global_id(0)
...: Y[i] = 2*X[i]
...:
In [5]: x, y
Out[5]:
(usm_ndarray([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
usm_ndarray([ 0, 0, 0, 0, 0,
0, 0, 0, 4295036673, 69376]))
In [6]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)
In [7]: x, y
Out[7]:
(usm_ndarray([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
usm_ndarray([2, 2, 2, 2, 2, 2, 2, 2, 2, 2]))
In [8]: x = dpt.ones(10, dtype='i8', device="gpu")
In [9]: y = dpt.empty_like(x)
In [10]: x, y
Out[10]:
(usm_ndarray([1, 1, 1, 1, 1, 1, 1, 1, 1, 1]),
usm_ndarray([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))
In [11]: foo[10, numba_dpex.DEFAULT_LOCAL_SIZE](x, y)
In [12]: x, y
Out[12]:
(usm_ndarray([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]),
usm_ndarray([0, 0, 0, 0, 0, 0, 0, 0, 0, 0]))
(dppy_stack2) opavlyk@opavlyk-mobl:~$ OCL_ICD_FILENAMES=${CONDA_PREFIX}/lib/libintelocl.so ipython -c "import dpctl, numba_dpex; print((dpctl.__version__, numba_dpex.__version__))"
/home/opavlyk/miniconda3/envs/dppy_stack2/lib/python3.9/site-packages/llvmlite/llvmpy/__init__.py:3: UserWarning: The module `llvmlite.llvmpy` is deprecated and will be removed in the future.
warnings.warn(
/home/opavlyk/miniconda3/envs/dppy_stack2/lib/python3.9/site-packages/llvmlite/llvmpy/core.py:8: UserWarning: The module `llvmlite.llvmpy.core` is deprecated and will be removed in the future. Equivalent functionality is provided by `llvmlite.ir`.
warnings.warn(
('0.14.1dev1+11.g47e4ae45f', '0.19.0+21.ga541db59')
but the result is not correct