dpctl
dpctl copied to clipboard
__dlpack_device__() returned numbers
Hi, I have question about dlpack results.
I created dpnp array, then checked __dlpack_device__() and got DLDeviceType=14(kDLOneAPI) and device_id =3. Could you help me understand what this 3 means? Because when I run sycl-ls
I get the output:
[opencl:cpu:0] Intel(R) OpenCL, Intel(R) Xeon(R) Gold [...]
[opencl:acc:1] Intel(R) FPGA Emulation Platform for OpenCL(TM) [...]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max [...]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max [...]
When I checked values for dltensor it shows level_zero:gpu:0
Here code example
import dpnp as dnp
if __name__ == "__main__":
first_number, second_number = dnp.arange(100, dtype=dnp.float32).__dlpack_device__()
print(first_number) # result 14
print(second_number) # result 3
@wozna The tuple returned by usm_ndarray.__dlpack_device__
corresponds to the (accelerator/framework identifier, and device_id).
The framework identifier is 14 (enumerator kDLOneAPI) as you have realized, and the device_id
is the stable numeric ordinal encoding of the root (unpartitioned) device consistent with SYCL-RT. It corresponds to the position of the device in the device vector returned by static method sycl::device::get_devices()
, and exposed to Python as dpctl.get_devices()
. Filter selector string consisting of just this identifier reconstructs the unpartitioned SYCL device:
In [1]: import dpctl.tensor as dpt
In [2]: x = dpt.arange(10, dtype="f4")
In [3]: x.device
Out[3]: Device(level_zero:gpu:0)
In [4]: x.__dlpack_device__()
Out[4]: (14, 2)
In [5]: import dpctl
In [6]: x.sycl_device == dpctl.SyclDevice("2")
Out[6]: True
In [7]: x.sycl_device == dpctl.get_devices()[2]
Out[7]: True
@wozna Let me know if you have further questions. Feel free to resolve if not.
@oleksandr-pavlyk Thank you for the answer. So if I have dltensor, only by calling sycl I can find out on which machine tensor is allocated (cpu or xpu) by comparing device_id with sycl::device::get_devices()?
@wozna Yes, that is correct. Handling kDLOneAPI
device requires a call to SYCL runtime
@oleksandr-pavlyk Now it is clear to me, thank you.
I have one more question about xpu tiles in case of dlpack. Because in dltensor we have info about device_id which tell us only on which device memory is allocated, not on which tile. So if we have data pointer in dltensor, how do we know on which tile is it? Do we have to know it if we want to implement zero-copy from_dlpack or to_dlpack?
Great question @wozna. It is possible to share tile allocations made using the default-platform context.
Steps for exporting DLPack for tile allocated memory:
- Check that the USM allocation is known to the platform's default context.
- Find ancestral root device for the allocation device
- set
device_id
to be position of this ancestral root device in thesycl::device::get_devices()
list.
Step for importing DLPack:
- Get the root device corresponding to
device_id
found in DLPack. - Get the device's platform, and get the default platform context
- Use
sycl::get_pointer_device(ptr, default_ctx)
to get the tile device the allocation was made on.
This logic is implemented in dpctl
's support for DLPack.
@wozna Is this ticket ready to be resolved?
@oleksandr-pavlyk Yes it can be resolved, thank you for your answers.