pyculib @jimburnsphd is there any chance you could please expand on what you did so as to specifically describe the problem? Also if you have a reproducer for what you saw that would be really helpful. The Python 3.6 "downgrade" is because there hasn't been a rebuild of this package for Python 3.7 yet, as far as I can tell the constraint solver is correct in its behaviour. Conda provides a contained ecosystem under which CUDA should work fine providing you have suitable drivers and hardware installed on the host system. 'DLL hell' is avoided through the use of isolated environments. Here is an example (the machine it is running on has the Nvidia drivers and a Nvidia GPU installed):

@jimburnsphd is there any chance you could please expand on what you did so as to specifically describe the problem? Also if you have a reproducer for what you saw that would be really helpful. The Python 3.6 "downgrade" is because there hasn't been a rebuild of this package for Python 3.7 yet, as far as I can tell the constraint solver is correct in its behaviour. Conda provides a contained ecosystem under which CUDA should work fine providing you have suitable drivers and hardware installed on the host system. 'DLL hell' is avoided through the use of isolated environments. Here is an example (the machine it is running on has the Nvidia drivers and a Nvidia GPU installed):

Open prafullasaxena opened this issue 5 years ago • 1 comments

@jimburnsphd is there any chance you could please expand on what you did so as to specifically describe the problem? Also if you have a reproducer for what you saw that would be really helpful. The Python 3.6 "downgrade" is because there hasn't been a rebuild of this package for Python 3.7 yet, as far as I can tell the constraint solver is correct in its behaviour. Conda provides a contained ecosystem under which CUDA should work fine providing you have suitable drivers and hardware installed on the host system. 'DLL hell' is avoided through the use of isolated environments. Here is an example (the machine it is running on has the Nvidia drivers and a Nvidia GPU installed):

$ conda create -n pyculib_example -q -y numba pyculib
Collecting package metadata: ...working... done
Solving environment: ...working... done

## Package Plan ##

  environment location: <snip>/envs/pyculib_example

  added / updated specs:
    - numba
    - pyculib


The following NEW packages will be INSTALLED:

  blas               pkgs/main/linux-64::blas-1.0-mkl
  ca-certificates    pkgs/main/linux-64::ca-certificates-2019.1.23-0
  certifi            pkgs/main/linux-64::certifi-2018.11.29-py36_0
  cffi               pkgs/main/linux-64::cffi-1.11.5-py36he75722e_1
  cudatoolkit        pkgs/main/linux-64::cudatoolkit-10.0.130-0
  intel-openmp       pkgs/main/linux-64::intel-openmp-2019.1-144
  libedit            pkgs/main/linux-64::libedit-3.1.20181209-hc058e9b_0
  libffi             pkgs/main/linux-64::libffi-3.2.1-hd88cf55_4
  libgcc-ng          pkgs/main/linux-64::libgcc-ng-8.2.0-hdf63c60_1
  libgfortran        pkgs/free/linux-64::libgfortran-3.0.0-1
  libgfortran-ng     pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
  libstdcxx-ng       pkgs/main/linux-64::libstdcxx-ng-8.2.0-hdf63c60_1
  llvmlite           pkgs/main/linux-64::llvmlite-0.27.0-py36hd408876_0
  mkl                pkgs/main/linux-64::mkl-2019.1-144
  ncurses            pkgs/main/linux-64::ncurses-6.1-he6710b0_1
  numba              pkgs/main/linux-64::numba-0.42.0-py36h962f231_0
  numpy              pkgs/main/linux-64::numpy-1.13.3-py36ha266831_3
  openssl            pkgs/main/linux-64::openssl-1.1.1a-h7b6447c_0
  pip                pkgs/main/linux-64::pip-19.0.1-py36_0
  pycparser          pkgs/main/linux-64::pycparser-2.19-py36_0
  pyculib            pkgs/free/linux-64::pyculib-1.0.2-np113py36_2
  pyculib_sorting    pkgs/free/linux-64::pyculib_sorting-1.0.0-8
  python             pkgs/main/linux-64::python-3.6.8-h0371630_0
  readline           pkgs/main/linux-64::readline-7.0-h7b6447c_5
  scipy              pkgs/main/linux-64::scipy-1.2.0-py36h7c811a0_0
  setuptools         pkgs/main/linux-64::setuptools-40.7.3-py36_0
  sqlite             pkgs/main/linux-64::sqlite-3.26.0-h7b6447c_0
  tk                 pkgs/main/linux-64::tk-8.6.8-hbc83047_0
  wheel              pkgs/main/linux-64::wheel-0.32.3-py36_0
  xz                 pkgs/main/linux-64::xz-5.2.4-h14c3975_4
  zlib               pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3


Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done

$ source activate pyculib_example

$ cat example.py
import numpy as np
import scipy.sparse.linalg
import pyculib

handle = pyculib.sparse.Sparse()
dtype = np.float32
m = n = 3
trans = 'N'


# Initialize the CSR matrix on the host and GPU.
row = np.array([0, 0, 0, 1, 1, 2])
col = np.array([0, 1, 2, 1, 2, 2])
data = np.array([0.431663, 0.955176, 0.925239, 0.0283651, 0.569277, 0.48015], dtype=dtype)

csrMatrixCpu = scipy.sparse.csr_matrix((data, (row, col)), shape=(m, n))
csrMatrixGpu = pyculib.sparse.csr_matrix((data, (row, col)), shape=(m, n))

print(csrMatrixCpu)
print(csrMatrixCpu.todense())

# Perform the analysis step on the GPU.
nnz = csrMatrixGpu.nnz
csrVal = csrMatrixGpu.data
csrRowPtr = csrMatrixGpu.indptr
csrColInd = csrMatrixGpu.indices

descr = handle.matdescr(0, 'N', 'U', 'G')
info = handle.csrsv_analysis(trans, m, nnz, descr, csrVal, csrRowPtr, csrColInd)


# Initialize the right-hand side of the system.
alpha = 1.0
rightHandSide = np.array([0.48200423, 0.39379725, 0.75963706], dtype=dtype)
gpuResult = np.zeros(m, dtype=dtype)


# Solve the system on the GPU and on the CPU.
handle.csrsv_solve(trans, m, alpha, descr, csrVal, csrRowPtr, csrColInd, info, rightHandSide, gpuResult)
cpuResult = scipy.sparse.linalg.dsolve.spsolve(csrMatrixCpu, rightHandSide, use_umfpack=False)

cpuDense = np.linalg.solve(csrMatrixCpu.todense(), rightHandSide)

print('gpu result = ' + str(gpuResult))
print('cpu result = ' + str(cpuResult))
print('cpu result = ' + str(cpuDense))(pyculib_example) 

$ python example.py 
  (0, 0)        0.431663
  (0, 1)        0.955176
  (0, 2)        0.925239
  (1, 1)        0.0283651
  (1, 2)        0.569277
  (2, 2)        0.48015
[[ 0.43166301  0.955176    0.92523903]
 [ 0.          0.0283651   0.56927699]
 [ 0.          0.          0.48015001]]
gpu result = [ 37.26496506 -17.86865234   1.58208275]
cpu result = [ 37.26496506 -17.86865234   1.58208275]
cpu result = [ 37.26496124 -17.86865044   1.58208275]

$ numba -s|grep -i cuda
__CUDA Information__
Found 1 CUDA devices
CUDA driver version                           : 10000
CUDA libraries:
cudatoolkit               10.0.130                      0

Originally posted by @stuartarchibald in https://github.com/numba/pyculib/issues/19#issuecomment-463113758

Jun 13 '19 23:06 prafullasaxena

gpu result = [ 37.26496506 -17.86865234 1.58208275] cpu result = [ 37.26496506 -17.86865234 1.58208275] cpu result = [ 37.26496124 -17.86865044 1.58208275]

How this answer is obtained in both gpu and cpu.. Is this not matrix vector multiplication? when i change matrix and vector data with row = [0,0,1,1,2,2,3,3] col = [2,3,0,3,1,2,0,1] data = [3,1,1,1,2,1,4,1] and rightHandSide = [1,2,1,2] then answer of bath cpu and gpu are different and i think wrong also in my view answer should be [5,3,5,6] as matrix vector multiplication

Jun 13 '19 23:06 prafullasaxena

pyculib pyculib copied to clipboard

pyculib
pyculib copied to clipboard