pyculib
pyculib copied to clipboard
@jimburnsphd is there any chance you could please expand on what you did so as to specifically describe the problem? Also if you have a reproducer for what you saw that would be really helpful. The Python 3.6 "downgrade" is because there hasn't been a rebuild of this package for Python 3.7 yet, as far as I can tell the constraint solver is correct in its behaviour. Conda provides a contained ecosystem under which CUDA should work fine providing you have suitable drivers and hardware installed on the host system. 'DLL hell' is avoided through the use of isolated environments. Here is an example (the machine it is running on has the Nvidia drivers and a Nvidia GPU installed):
@jimburnsphd is there any chance you could please expand on what you did so as to specifically describe the problem? Also if you have a reproducer for what you saw that would be really helpful. The Python 3.6 "downgrade" is because there hasn't been a rebuild of this package for Python 3.7 yet, as far as I can tell the constraint solver is correct in its behaviour. Conda provides a contained ecosystem under which CUDA should work fine providing you have suitable drivers and hardware installed on the host system. 'DLL hell' is avoided through the use of isolated environments. Here is an example (the machine it is running on has the Nvidia drivers and a Nvidia GPU installed):
$ conda create -n pyculib_example -q -y numba pyculib
Collecting package metadata: ...working... done
Solving environment: ...working... done
## Package Plan ##
environment location: <snip>/envs/pyculib_example
added / updated specs:
- numba
- pyculib
The following NEW packages will be INSTALLED:
blas pkgs/main/linux-64::blas-1.0-mkl
ca-certificates pkgs/main/linux-64::ca-certificates-2019.1.23-0
certifi pkgs/main/linux-64::certifi-2018.11.29-py36_0
cffi pkgs/main/linux-64::cffi-1.11.5-py36he75722e_1
cudatoolkit pkgs/main/linux-64::cudatoolkit-10.0.130-0
intel-openmp pkgs/main/linux-64::intel-openmp-2019.1-144
libedit pkgs/main/linux-64::libedit-3.1.20181209-hc058e9b_0
libffi pkgs/main/linux-64::libffi-3.2.1-hd88cf55_4
libgcc-ng pkgs/main/linux-64::libgcc-ng-8.2.0-hdf63c60_1
libgfortran pkgs/free/linux-64::libgfortran-3.0.0-1
libgfortran-ng pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
libstdcxx-ng pkgs/main/linux-64::libstdcxx-ng-8.2.0-hdf63c60_1
llvmlite pkgs/main/linux-64::llvmlite-0.27.0-py36hd408876_0
mkl pkgs/main/linux-64::mkl-2019.1-144
ncurses pkgs/main/linux-64::ncurses-6.1-he6710b0_1
numba pkgs/main/linux-64::numba-0.42.0-py36h962f231_0
numpy pkgs/main/linux-64::numpy-1.13.3-py36ha266831_3
openssl pkgs/main/linux-64::openssl-1.1.1a-h7b6447c_0
pip pkgs/main/linux-64::pip-19.0.1-py36_0
pycparser pkgs/main/linux-64::pycparser-2.19-py36_0
pyculib pkgs/free/linux-64::pyculib-1.0.2-np113py36_2
pyculib_sorting pkgs/free/linux-64::pyculib_sorting-1.0.0-8
python pkgs/main/linux-64::python-3.6.8-h0371630_0
readline pkgs/main/linux-64::readline-7.0-h7b6447c_5
scipy pkgs/main/linux-64::scipy-1.2.0-py36h7c811a0_0
setuptools pkgs/main/linux-64::setuptools-40.7.3-py36_0
sqlite pkgs/main/linux-64::sqlite-3.26.0-h7b6447c_0
tk pkgs/main/linux-64::tk-8.6.8-hbc83047_0
wheel pkgs/main/linux-64::wheel-0.32.3-py36_0
xz pkgs/main/linux-64::xz-5.2.4-h14c3975_4
zlib pkgs/main/linux-64::zlib-1.2.11-h7b6447c_3
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
$ source activate pyculib_example
$ cat example.py
import numpy as np
import scipy.sparse.linalg
import pyculib
handle = pyculib.sparse.Sparse()
dtype = np.float32
m = n = 3
trans = 'N'
# Initialize the CSR matrix on the host and GPU.
row = np.array([0, 0, 0, 1, 1, 2])
col = np.array([0, 1, 2, 1, 2, 2])
data = np.array([0.431663, 0.955176, 0.925239, 0.0283651, 0.569277, 0.48015], dtype=dtype)
csrMatrixCpu = scipy.sparse.csr_matrix((data, (row, col)), shape=(m, n))
csrMatrixGpu = pyculib.sparse.csr_matrix((data, (row, col)), shape=(m, n))
print(csrMatrixCpu)
print(csrMatrixCpu.todense())
# Perform the analysis step on the GPU.
nnz = csrMatrixGpu.nnz
csrVal = csrMatrixGpu.data
csrRowPtr = csrMatrixGpu.indptr
csrColInd = csrMatrixGpu.indices
descr = handle.matdescr(0, 'N', 'U', 'G')
info = handle.csrsv_analysis(trans, m, nnz, descr, csrVal, csrRowPtr, csrColInd)
# Initialize the right-hand side of the system.
alpha = 1.0
rightHandSide = np.array([0.48200423, 0.39379725, 0.75963706], dtype=dtype)
gpuResult = np.zeros(m, dtype=dtype)
# Solve the system on the GPU and on the CPU.
handle.csrsv_solve(trans, m, alpha, descr, csrVal, csrRowPtr, csrColInd, info, rightHandSide, gpuResult)
cpuResult = scipy.sparse.linalg.dsolve.spsolve(csrMatrixCpu, rightHandSide, use_umfpack=False)
cpuDense = np.linalg.solve(csrMatrixCpu.todense(), rightHandSide)
print('gpu result = ' + str(gpuResult))
print('cpu result = ' + str(cpuResult))
print('cpu result = ' + str(cpuDense))(pyculib_example)
$ python example.py
(0, 0) 0.431663
(0, 1) 0.955176
(0, 2) 0.925239
(1, 1) 0.0283651
(1, 2) 0.569277
(2, 2) 0.48015
[[ 0.43166301 0.955176 0.92523903]
[ 0. 0.0283651 0.56927699]
[ 0. 0. 0.48015001]]
gpu result = [ 37.26496506 -17.86865234 1.58208275]
cpu result = [ 37.26496506 -17.86865234 1.58208275]
cpu result = [ 37.26496124 -17.86865044 1.58208275]
$ numba -s|grep -i cuda
__CUDA Information__
Found 1 CUDA devices
CUDA driver version : 10000
CUDA libraries:
cudatoolkit 10.0.130 0
Originally posted by @stuartarchibald in https://github.com/numba/pyculib/issues/19#issuecomment-463113758
gpu result = [ 37.26496506 -17.86865234 1.58208275] cpu result = [ 37.26496506 -17.86865234 1.58208275] cpu result = [ 37.26496124 -17.86865044 1.58208275]
How this answer is obtained in both gpu and cpu.. Is this not matrix vector multiplication? when i change matrix and vector data with row = [0,0,1,1,2,2,3,3] col = [2,3,0,3,1,2,0,1] data = [3,1,1,1,2,1,4,1] and rightHandSide = [1,2,1,2] then answer of bath cpu and gpu are different and i think wrong also in my view answer should be [5,3,5,6] as matrix vector multiplication