pyculib
pyculib copied to clipboard
Passing device array to routines
Hello!
I would like to know if it is possible to pass arrays that are on the device to the routines. For example something like this (that I tried but is failing),
A_d = cuda.to_device(A) B_d = cuda.to_device(A)
C_h = cublas.gemm("N", "N", 1.0, A, B)
C_h = cuda.copy_to_host(C_d)
This could be useful in order to avoid memory transfer if the data need to be use again later.
Best regards, Marc
Yes, all of these methods should accept Numba device allocations. What error did you see?
I get the following error,
File "test.py", line 25, in
I'm using the development version of numba In [3]: numba.version Out[3]: '0.35.0rc1+2.g51474c0-py3.5-linux-x86_64.egg'
The test is the following,
import numpy as np import scipy.linalg.blas as blas import numba.cuda as cuda import pyculib.blas as cublas
A = np.random.randn(3, 3) B = np.random.randn(3, 3)
C = blas.sgemm(1.0, A, B) print(C)
A_d = cuda.to_device(A) B_d = cuda.to_device(B)
C_d = cublas.gemm("N", "N", 1.0, A_d, B_d)
C_h = np.zeros((3, 3), dtype=np.float64) C_d.copy_to_host(C_h) print(C_h)
I think this is because cublas.gemm
inherited semantics from a more general BLAS
interface which operated on np.ndarray
instances and required Fortran ordered arrays. This behaviour is enforced by probing the np.ndarray.flags
attr and calling np.asfortranarray
in the case where the array doesn't conform. I guess there's two options to fix:
- Make the
BLAS
call sites only enforce ordering on Numpy ndarray instances. - Add a
.flags
attribute to theDeviceNDArray
class.
Anyone have a preference/other suggestion?
I assume DeviceNDArrays can only be C ordered at the moment? Otherwise, option 2 seems best.
However, option 1 might be a good interim solution, since Numba's release cycle might be too long to wait to fix this.
Think you can create a Fortran ordered one:
In [25]: z=np.zeros((4,3)).T
In [26]: z.flags
Out[26]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
In [27]: dA=devicearray.DeviceNDArray(z.shape,z.strides,z.dtype)
In [28]: dA.is_c_contiguous()
Out[28]: False
In [29]: dA.is_f_contiguous()
Out[29]: True
the information comes from a numba.dummyarray
reference held in the class, therefore .flags
can be added and can support keys to the level dummyarray
does. This raises the question of, if the device array is C_CONTIGUOUS
and recognisably so by .flags
should a copy be made to force Fortran order given the user is quite likely to be reusing device arrays for performance?
Logic like:
if isinstance(arrayref, DeviceNDArray):
if not F_CONTIG:
raise ValueError("Invalid layout for BLAS call") # User has to manage device memory ordering
else: # it's a np.ndarray instance
if not F_CONTIG:
return np.asfortranarray(arrayref)
return arrayref
is one option.
Sorry for the noise, closed wrong issue.