dpctl icon indicating copy to clipboard operation
dpctl copied to clipboard

__dlpack_device__() returned numbers

Open wozna opened this issue 1 year ago • 7 comments

Hi, I have question about dlpack results. I created dpnp array, then checked __dlpack_device__() and got DLDeviceType=14(kDLOneAPI) and device_id =3. Could you help me understand what this 3 means? Because when I run sycl-ls I get the output:

[opencl:cpu:0] Intel(R) OpenCL, Intel(R) Xeon(R) Gold [...]
[opencl:acc:1] Intel(R) FPGA Emulation Platform for OpenCL(TM) [...]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max [...]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max [...]

When I checked values for dltensor it shows level_zero:gpu:0 image

Here code example

import dpnp as dnp

if __name__ == "__main__":
    first_number, second_number = dnp.arange(100, dtype=dnp.float32).__dlpack_device__()
    print(first_number) # result 14 
    print(second_number) # result 3 

wozna avatar Sep 11 '23 16:09 wozna

@wozna The tuple returned by usm_ndarray.__dlpack_device__ corresponds to the (accelerator/framework identifier, and device_id).

The framework identifier is 14 (enumerator kDLOneAPI) as you have realized, and the device_id is the stable numeric ordinal encoding of the root (unpartitioned) device consistent with SYCL-RT. It corresponds to the position of the device in the device vector returned by static method sycl::device::get_devices(), and exposed to Python as dpctl.get_devices(). Filter selector string consisting of just this identifier reconstructs the unpartitioned SYCL device:

In [1]: import dpctl.tensor as dpt

In [2]: x = dpt.arange(10, dtype="f4")

In [3]: x.device
Out[3]: Device(level_zero:gpu:0)

In [4]: x.__dlpack_device__()
Out[4]: (14, 2)

In [5]: import dpctl

In [6]: x.sycl_device == dpctl.SyclDevice("2")
Out[6]: True

In [7]: x.sycl_device == dpctl.get_devices()[2]
Out[7]: True

oleksandr-pavlyk avatar Sep 12 '23 11:09 oleksandr-pavlyk

@wozna Let me know if you have further questions. Feel free to resolve if not.

oleksandr-pavlyk avatar Sep 15 '23 21:09 oleksandr-pavlyk

@oleksandr-pavlyk Thank you for the answer. So if I have dltensor, only by calling sycl I can find out on which machine tensor is allocated (cpu or xpu) by comparing device_id with sycl::device::get_devices()?

wozna avatar Sep 18 '23 17:09 wozna

@wozna Yes, that is correct. Handling kDLOneAPI device requires a call to SYCL runtime

oleksandr-pavlyk avatar Sep 18 '23 18:09 oleksandr-pavlyk

@oleksandr-pavlyk Now it is clear to me, thank you.

wozna avatar Sep 19 '23 07:09 wozna

I have one more question about xpu tiles in case of dlpack. Because in dltensor we have info about device_id which tell us only on which device memory is allocated, not on which tile. So if we have data pointer in dltensor, how do we know on which tile is it? Do we have to know it if we want to implement zero-copy from_dlpack or to_dlpack?

wozna avatar Oct 03 '23 13:10 wozna

Great question @wozna. It is possible to share tile allocations made using the default-platform context.

Steps for exporting DLPack for tile allocated memory:

  • Check that the USM allocation is known to the platform's default context.
  • Find ancestral root device for the allocation device
  • set device_id to be position of this ancestral root device in the sycl::device::get_devices() list.

Step for importing DLPack:

  • Get the root device corresponding to device_id found in DLPack.
  • Get the device's platform, and get the default platform context
  • Use sycl::get_pointer_device(ptr, default_ctx) to get the tile device the allocation was made on.

This logic is implemented in dpctl's support for DLPack.

oleksandr-pavlyk avatar Oct 03 '23 18:10 oleksandr-pavlyk

@wozna Is this ticket ready to be resolved?

oleksandr-pavlyk avatar May 20 '24 14:05 oleksandr-pavlyk

@oleksandr-pavlyk Yes it can be resolved, thank you for your answers.

wozna avatar May 20 '24 16:05 wozna