pyculib icon indicating copy to clipboard operation
pyculib copied to clipboard

Passing device array to routines

Open mbarbry opened this issue 7 years ago • 7 comments

Hello!

I would like to know if it is possible to pass arrays that are on the device to the routines. For example something like this (that I tried but is failing),

A_d = cuda.to_device(A) B_d = cuda.to_device(A)

C_h = cublas.gemm("N", "N", 1.0, A, B)

C_h = cuda.copy_to_host(C_d)

This could be useful in order to avoid memory transfer if the data need to be use again later.

Best regards, Marc

mbarbry avatar Aug 28 '17 18:08 mbarbry

Yes, all of these methods should accept Numba device allocations. What error did you see?

seibert avatar Aug 28 '17 19:08 seibert

I get the following error,

File "test.py", line 25, in C_d = cublas.gemm("N", "N", 1.0, A_d, B_d) File "/usr/local/lib/python3.5/dist-packages/pyculib/blas/init.py", line 143, in gemm colmajor(A, dtype, 'A'), colmajor(B, dtype, 'B'), File "/usr/local/lib/python3.5/dist-packages/pyculib/nputil.py", line 38, in colmajor if not x.flags['F_CONTIGUOUS']: AttributeError: 'DeviceNDArray' object has no attribute 'flags'

I'm using the development version of numba In [3]: numba.version Out[3]: '0.35.0rc1+2.g51474c0-py3.5-linux-x86_64.egg'

The test is the following,

import numpy as np import scipy.linalg.blas as blas import numba.cuda as cuda import pyculib.blas as cublas

A = np.random.randn(3, 3) B = np.random.randn(3, 3)

C = blas.sgemm(1.0, A, B) print(C)

A_d = cuda.to_device(A) B_d = cuda.to_device(B)

C_d = cublas.gemm("N", "N", 1.0, A_d, B_d)

C_h = np.zeros((3, 3), dtype=np.float64) C_d.copy_to_host(C_h) print(C_h)

mbarbry avatar Aug 28 '17 22:08 mbarbry

I think this is because cublas.gemm inherited semantics from a more general BLAS interface which operated on np.ndarray instances and required Fortran ordered arrays. This behaviour is enforced by probing the np.ndarray.flags attr and calling np.asfortranarray in the case where the array doesn't conform. I guess there's two options to fix:

  1. Make the BLAS call sites only enforce ordering on Numpy ndarray instances.
  2. Add a .flags attribute to the DeviceNDArray class.

Anyone have a preference/other suggestion?

stuartarchibald avatar Aug 29 '17 14:08 stuartarchibald

I assume DeviceNDArrays can only be C ordered at the moment? Otherwise, option 2 seems best.

seibert avatar Aug 29 '17 14:08 seibert

However, option 1 might be a good interim solution, since Numba's release cycle might be too long to wait to fix this.

seibert avatar Aug 29 '17 15:08 seibert

Think you can create a Fortran ordered one:

In [25]: z=np.zeros((4,3)).T

In [26]: z.flags
Out[26]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [27]: dA=devicearray.DeviceNDArray(z.shape,z.strides,z.dtype)

In [28]: dA.is_c_contiguous()
Out[28]: False

In [29]: dA.is_f_contiguous()
Out[29]: True

the information comes from a numba.dummyarray reference held in the class, therefore .flags can be added and can support keys to the level dummyarray does. This raises the question of, if the device array is C_CONTIGUOUS and recognisably so by .flags should a copy be made to force Fortran order given the user is quite likely to be reusing device arrays for performance?

Logic like:

if isinstance(arrayref, DeviceNDArray):
    if not F_CONTIG:
        raise ValueError("Invalid layout for BLAS call") # User has to manage device memory ordering
else: # it's a np.ndarray instance
    if not F_CONTIG:
        return np.asfortranarray(arrayref)
    return arrayref

is one option.

stuartarchibald avatar Aug 29 '17 16:08 stuartarchibald

Sorry for the noise, closed wrong issue.

stuartarchibald avatar Feb 18 '19 13:02 stuartarchibald