[Question] dpnp functionality with switching device and context
I am working with a cupy project for Aurora and was wondering how to port with switching devices such as the following. Appreciate any pointers
with cupy.cuda.Device(device_id), _streams[device_id]:
Array content can be migrated to a different device using either dpnp.ndarray.to_device() method, or by using dpnp.asarray() function.
The arr.to_device(device=target_device) method will be zero-copy if the arr.sycl_queue and the dpctl.SyclQueue instance associated with new target device have the same underlying sycl::device and sycl::context instances.
Here is an example of migration without a copy using .to_device method:
import dpnp, dpctl
x = dpnp.linspace(0, 1, num=10**8)
q_prof = dpctl.SyclQueue(x.sycl_context, x.sycl_device, property="enable_profiling")
timer = dpctl.SyclTimer()
# no data migration takes place here (zero-copy),
# but x and x1 arrays do not satify compute-follows-data requirements
x1 = x.to_device(q_prof)
with timer(q_prof):
y1 = dpnp.sin(2 * x1) * dpnp.exp(-dpnp.square(x1))
# also a zero copy operation
y = y1.to_device(x.device)
host_dt, device_dt = timer.dt
print(f"Execution on device {x.sycl_device.name} took {device_dt} seconds")
print(f"Execution on host took {host_dt} seconds")
Data migration when the current and the target SYCL contexts are different is performed via host. That means that data are copied from the current device to the host, and then from the host to the target device:
import dpnp
x_cpu = dpnp.concat((dpnp.ones(10, device="cpu"), dpnp.zeros(1000, device="cpu")))
# data migration is performed via host
x_gpu = x_cpu.to_device("gpu")
An alternative way to migrate data is to use asarray function and specify device-placement keyword arguments:
import dpnp
x_cpu = dpnp.concat((dpnp.ones(10, device="cpu"), dpnp.zeros(1000, device="cpu")))
# data migration is performed via host
x_gpu = dpnp.asarray(x_cpu, device="gpu")
An advantage of using the function asarray is that migration from dpnp.ndarray instances allocated on different devices as well migration from numpy.ndarray may be accomplished in a single call:
import dpnp, numpy
x_cpu = dpnp.ones((10, 10), device="cpu")
x_gpu = dpnp.zeros((10, 10), device="opencl:gpu")
x_np = numpy.random.randn(10, 10)
# Array w has shape (3, 10, 10)
w = dpnp.asarray([x_cpu, x_gpu, x_np], device="level_zero:gpu")
Migration may also occur during calls to other array creation functions, dpnp.full() when the fill_value parameter is an instance of dpnp.ndarray. In such a case default values of device placement keywords are interpreted to avoid data migration, i.e., the new array is created on the same device where fill_value array was allocated.
import dpnp
# Zero-dimensional array allocated on CPU device
pi_on_device = dpnp.asarray(dpnp.pi, dtype=dpnp.float32, device="cpu")
# x will also be allocated on CPU device
x = dpnp.full(shape=(100, 100), fill_value=pi_on_device)
# Create array on GPU. Migration of `pi_on_device` to GPU via host
# takes place under the hood
y_gpu = dpnp.full(shape=(100, 100), fill_value=pi_on_device, device="gpu")
Hope it would be helpful. Please let me know if any additional info is required.