scikit-cuda icon indicating copy to clipboard operation
scikit-cuda copied to clipboard

support of heevdx in MAGMA?

Open lanzithinking opened this issue 4 years ago • 15 comments

@wingkitlee: Do you have plan to support partial eigensolvers for symmetric/hermitian matrix, routines under 'heevdx' of MAGMA? I see there is only one function 'magma_ssyevdx_m' supported; however, several parameters, 'vl, vu, il, iu, m', are missing when you call MAGMA library in the master branch. Do you plan to include more functions, e.g. 'magma_dsyevdx' etc., in this category? Thanks!

lanzithinking avatar Jan 15 '20 10:01 lanzithinking

@lanzithinking there is no specific 'plan', so feel free to contribute test cases or even codes! If it's not urgent, I can take a look at it in my free time after you give me some test cases (e.g., 3x3 matrix and expected result). It would probably take weeks to code.. Otherwise, you can checkout other working magma functions (e.g., geev) to see how the wrapper works and code it up. Let me know if you have any questions. I am glad to help in either way.

wingkitlee0 avatar Jan 15 '20 16:01 wingkitlee0

Dear Kit Lee,

Thanks for your reply! I worked a little bit as you suggested and have attached my codes to the email:

  1. magma.py, MAGMA wrapper modified from scikit-cuda: in the same file https://github.com/lebedov/scikit-cuda/blob/master/skcuda/magma.py https://github.com/lebedov/scikit-cuda/blob/master/skcuda/magma.py in the master branch, — there are two ‘x’ missing on line 3721-3732 in the definition of ‘_libmagma.magma_ssyevd_m’ (should be '_libmagma.magma_ssyevdx_m’) coz ‘x’ is for partial Eigen-values according to MAGMA document: https://icl.cs.utk.edu/projectsfiles/magma/doxygen/group__magma__heevdx.html#ga799dfa8ad62888d9a608a3a5dfd69237 https://icl.cs.utk.edu/projectsfiles/magma/doxygen/group__magma__heevdx.html#ga799dfa8ad62888d9a608a3a5dfd69237 — on line 3751, parameters 'vl, vu, il, iu, m,’ are missing from the calling of ‘_libmagma.magma_ssyevdx_m’.
  2. eigs.py, partial Eigen-solver, mimicking ’Scipy.sparse.linalg.eigs’, modified from your 'gpu-linalg.py’ https://github.com/wingkitlee0/cuda-python-magma-examples/blob/master/gpu-linalg.py https://github.com/wingkitlee0/cuda-python-magma-examples/blob/master/gpu-linalg.py
  3. test5.py, testing file

I am working on NVIDIA TESLA V100, Ubuntu 18.04.

Put those three files in the same folder and run test5. I got the following error:

Traceback (most recent call last): File "test5.py", line 30, in W_gpu, V_gpu = eigs(M_gpu, k) File "/home/slan7/cuda-python-magma-examples/skcuda_/linalg/eigs.py", line 61, in eigs w_gpu.ctypes.data, work.ctypes.data, lwork, iwork.ctypes.data, liwork) File "/home/slan7/cuda-python-magma-examples/skcuda_/magma.py", line 3752, in magma_ssyevdx_m int(w), int(work), lwork, int(iwork), liwork, ctypes.byref(info)) ctypes.ArgumentError: argument 2: <class 'TypeError'>: wrong type

It would be nice if you could help me to see what went wrong. Thank you so much!

Best regards,

Shiwei Lan

On Jan 15, 2020, at 9:56 AM, Kit Lee [email protected] wrote:

@lanzithinking https://github.com/lanzithinking there is no specific 'plan', so feel free to contribute test cases or even codes! If it's not urgent, I can take a look at it in my free time after you give me some test cases (e.g., 3x3 matrix and expected result). It would probably take weeks to code.. Otherwise, you can checkout other working magma functions (e.g., geev) to see how the wrapper works and code it up. Let me know if you have any questions. I am glad to help in either way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lebedov/scikit-cuda/issues/289?email_source=notifications&email_token=AAXGXDAMZGQ5VFJZDAXUCGTQ5454RA5CNFSM4KHBFZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJBAUFY#issuecomment-574753303, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXGXDFFYAK4JOLXWOUMNXTQ5454RANCNFSM4KHBFZPQ.

lanzithinking avatar Jan 17 '20 06:01 lanzithinking

Hi @lanzithinking, thanks for the attempt! A few things to note:

  1. It seems that code attachments were not included in your reply. You may try to paste it here using "insert code" above the comment box, but I think I get what you were doing.
  2. The 'x' variants are for the different implementations. So they are actually different functions and you should keep those non-x functions.
  3. From the Magma doc, the third argument (i.e., "2") of ssyevdx_m is a "range" type. In line 117 of the original magma.py: _libmagma.magma_range_const.argtypes = [ctypes.c_char] which means the input (in python) should be a single character.

Hope this help! PS. I just found I didn't install scikit-cuda on my current machine.. so for now I may not able to test anything..

wingkitlee0 avatar Jan 17 '20 07:01 wingkitlee0

Hi @wingkitlee0 , thanks for the quick response. I insert 'eigs' function I modified from your gpu-linalg.py here to see if you can spot any error (I set rnge to be 'I', which is a single character):

` import numpy as np import magma

typedict = {'s': np.float32, 'd': np.float64, 'c': np.complex64, 'z': np.complex128} typedict_= {v: k for k, v in typedict.items()}

def eigs(a_gpu, k=None, which='LM', imag=False, return_eigenvectors=True):

if len(a_gpu.shape) != 2:
    raise ValueError("M needs to be a rank 2 square array for eig.")

magma.magma_init()
ngpu=1

dtype = a_gpu.dtype.type
t = typedict_[dtype]
N = a_gpu.shape[1]

if k is None: k=int(np.ceil(N/2))

if 'L' in which:
    a_gpu = -a_gpu

if return_eigenvectors:
    jobz = 'V'
else:
    jobz = 'N'

rnge = 'I'; uplo = 'U'
vl = 0; vu = 1e10
il = 1; iu = k; m = 0

w_gpu = np.empty((N,), dtype, order='F') # eigenvalues

if t == 's':
    nb = magma.magma_get_ssytrd_nb(N)
elif t == 'd':
    nb = magma.magma_get_dsytrd_nb(N)
else:
    raise ValueError('unsupported type')

lwork = N*(1 + 2*nb)
if jobz:
    lwork = max(lwork, 1+6*N+2*N**2)
work = np.empty(lwork, dtype)
liwork = 3+5*N if jobz else 1
iwork = np.empty(liwork, dtype)

if t == 's':
    status = magma.magma_ssyevdx_m(ngpu, jobz, rnge, uplo, N, a_gpu.ctypes.data, N,
                                   vl, vu, il, iu, m, 
                                   w_gpu.ctypes.data, work.ctypes.data, lwork, iwork.ctypes.data, liwork)
elif t == 'd':
    status = magma.magma_dsyevdx_m(ngpu, jobz, rnge, uplo, N, a_gpu.ctypes.data, N,
                                   vl, vu, il, iu, m, 
                                   w_gpu.ctypes.data, work.ctypes.data, lwork, iwork.ctypes.data, liwork)
else:
    raise ValueError('unsupported type')

if 'L' in which:
    w_gpu = -w_gpu

magma.magma_finalize()

if jobz:
    return w_gpu, a_gpu
else:
    return w_gpu

`

lanzithinking avatar Jan 17 '20 08:01 lanzithinking

Did you call the following in your ssyevdx_m function? rnge = _range_conversion[rnge]

wingkitlee0 avatar Jan 17 '20 18:01 wingkitlee0

Hi @lanzithinking , I think I figured out about the TypeError complained by ctypes. In magma.py, all of range, jobz, uplo need to have ctypes argument of c_int_type. You may try the following code.

Can you come up with some test cases (numerical values)? I can add the codes and make a pull request next week.

# SSYEVDX_M, DSYEVDX_M, CHEEVDX_M, ZHEEVDX_M
_libmagma.magma_ssyevdx_m.restype = int
_libmagma.magma_ssyevdx_m.argtypes = [c_int_type,        # ngpu
                                     c_int_type,     # jobz
                                     c_int_type,     # rnge
                                     c_int_type,     # uplo
                                     c_int_type,        # n
                                     ctypes.c_void_p,   # A
                                     c_int_type,        # lda
                                     ctypes.c_float,    # vl
                                     ctypes.c_float,    # vu
                                     c_int_type,        # il
                                     c_int_type,        # iu
                                     ctypes.c_void_p,   # m
                                     ctypes.c_void_p,   # w
                                     ctypes.c_void_p,   # work
                                     c_int_type,        # lwork
                                     ctypes.c_void_p,   # iwork
                                     c_int_type,        # liwork
                                     ctypes.c_void_p]   # info
def magma_ssyevdx_m(ngpu, jobz, rnge, uplo, n, A, lda,
                    vl, vu, il, iu, m,
                    w, work, lwork, iwork, liwork):
    """
    Compute eigenvalues of real symmetric matrix.
    Multi-GPU, data on host
    """

    # _XXX_conversion[] returns integer according to magma_types.h
    jobz = _vec_conversion[jobz] 
    rnge = _range_conversion[rnge]
    uplo = _uplo_conversion[uplo]
    
    info = c_int_type()
    status = _libmagma.magma_ssyevdx_m(ngpu, jobz, rnge, uplo, n, int(A), lda,
                                        vl, vu, il, iu, int(m), 
                                        int(w), int(work), lwork, int(iwork), liwork, 
                                        ctypes.byref(info))
    magmaCheckStatus(status)

wingkitlee0 avatar Jan 18 '20 06:01 wingkitlee0

Hi @wingkitlee0, thank you so much! I was actually writing the comment while getting your message! Yes, there are two lines missing from the magma.py in the master branch: ` jobz = _vec_conversion[jobz]

rnge = _range_conversion[rnge] ` Similarly this is the case for many other functions involving 'jobz' and 'rnge'. Now it is working without bug. I am working to verify with Scipy. Thanks again for your quick action and great help!

lanzithinking avatar Jan 18 '20 07:01 lanzithinking

The testing result is very disappointing... 4 functions: Neig - NumPy.linalg.eig; Seigs - Scipy.sparse.linalg.eigs; Keig - skcuda.linalg.eig; Xeig - coded function using magma_ssyevdx_m. N: size of PD matrix; k: the first k eigen-pairs for 'eigs'

The output eigenvalues and eigenvectors are cross-checked among 4 functions. However, the speed is shown below:

N=10, k=4: Neig: 0.00030 Seigs: 0.00075 Keig: 0.00093 Xeigs: 0.00047

N=100, k=50: Neig: 0.01453 Seigs: 0.01896 Keig: 0.00653 Xeigs: 0.01343

N=1000, k=100: Neig: 1.28382 Seigs: 0.53301 Keig: 0.34703 Xeigs: Segmentation fault (core dumped)

I found: (1) Xeigs can output all N eigen-pairs -- this is not partial eigensolver! (2) Xeigs generally fails (Segmentation fault (core dumped)) for matrices of size above 150 -- why?

lanzithinking avatar Jan 18 '20 08:01 lanzithinking

First, don't get discouraged...

  1. The result you are showing is for speed. I would first make sure they all give the same and correct answer. You may use np.allclose to do the comparison

  2. I believe there are also magma_ssyevdx_gpu for single gpu and the non-x versions. Maybe you need to test all of them to see which one is the best I realize that ssyev does not have the non-d version.. On the other hand, the _gpu version requires some pycuda stuff and it's generally not well-tested here..

  3. Segmentation fault with large N sounds like the gpu memory is not enough. Close other programs? It seems that in Line 281 of ssyedx_m.cpp falls back to Lapack on CPU if the matrix size is small. This means that we haven't been able to make this function work on GPU yet!

  4. May I ask what does partial eigensolver do? just the first few eigenvalues according to its magnitude? if it does things iteratively, I think it's still possible to compute all of them but slower...

wingkitlee0 avatar Jan 18 '20 17:01 wingkitlee0

Thanks for your patience.

4 functions output eigenvalues/eigenvectors equal upto 5~6 digits -- this is understandable given their different realizations.

I tried 'magma_ssyevdx' and got the similar results. I did not try 'magma_ssyevdx_gpu' since _gpu version takes in PyCUDA gpuarray data, rather than ndarray data. This potentially could be faster since it avoids repeated host-GPU data transfer?

Thanks for digging into the cpp file and that helps me explain the seg-fault.

Partial eigensovler is important for dimension reduction. Big matrix can be approximated by low-rank matrix using the first k (to be specified) principal (based on the absolute value/magnitude of the eigenvalues) eigen-pairs. This could be realized by Krylov-subspace method or randomized linear algebra algorithms. I thought 'magma_ssyevdx' and its variants could do this job on GPU. However they seem not not based on partial eigensolver. It makes a significant difference in computation time to obtain the first few versus all eigen-pairs, otherwise I could have use skcuda.linalg.eig (which uses cuBALS/CULA) and output the first k pairs without bothering all the stuff. I am not sure whether MAGMA has implemented the partial eigensolver but I did not find it.

Anyway, thanks for all your kind support and I will move on for alternatives for my project.

lanzithinking avatar Jan 19 '20 05:01 lanzithinking

No problem. Just want you to know that the testing routine from Magma itself (version 2.5.2, built from source) does work. I ran ./testing_ssyevd_gpu --lapack --version 2 --irange 10 in the testing directory of Magma which calls the ssyevdx_gpu function. It does run faster than cpu as listed below. It should just be an interface issue.

% MAGMA 2.5.2  compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 10000, driver 10020. OpenMP threads 12. 
% device 0: GeForce GTX 1050 Ti with Max-Q Design, 1290.5 MHz clock, 4040.1 MiB memory, capability 6.1
% Sat Jan 18 19:15:28 2020
% Usage: ./testing_ssyevd_gpu [options] [-h|--help]

% jobz = No vectors, uplo = Lower, version = 2 (ssyevdx_gpu)
%   N   CPU Time (sec)   GPU Time (sec)   |S-S_magma|   |A-USU^H|   |I-U^H U|
%============================================================================
irange (1, 10)
 1088      0.1465           0.1789         0.00e+00        ---         ---      ok
irange (1, 10)
 2112      0.5888           0.6362         0.00e+00        ---         ---      ok
irange (1, 10)
 3136      1.6521           1.1389         1.12e-10        ---         ---      ok
irange (1, 10)
 4160      3.9832           1.8834         7.10e-11        ---         ---      ok
irange (1, 10)
 5184      7.7379           2.9335         9.81e-10        ---         ---      ok
irange (1, 10)
 6208     13.1770           4.5590         1.39e-11        ---         ---      ok
irange (1, 10)
 7232     20.3966           6.8185         2.71e-10        ---         ---      ok
irange (1, 10)
 8256     29.4747           9.5467         3.87e-10        ---         ---      ok
irange (1, 10)
 9280     41.4837          12.6002         6.80e-11        ---         ---      ok
irange (1, 10)
10304     55.2372          16.6859         3.68e-11        ---         ---      ok

wingkitlee0 avatar Jan 19 '20 05:01 wingkitlee0

Thank you very much for the information! I also tested ‘ssyevd_gpu’ as you did and got about 3~4 speed up (interesting~) I will see if I can make magma_ssyevd_gpu work and let you know. Thanks!

On Jan 18, 2020, at 10:31 PM, Kit Lee [email protected] wrote:

No problem. Just want you to know that the testing routine from Magma itself (version 2.5.2, built from source) does work. I ran ./testing_ssyevd_gpu --lapack --version 2 --irange 10 in the testing directory of Magma which calls the ssyevdx_gpu function. It does run faster than cpu as listed below. It should just be an interface issue.

% MAGMA 2.5.2 compiled for CUDA capability >= 3.0, 32-bit magma_int_t, 64-bit pointer. % CUDA runtime 10000, driver 10020. OpenMP threads 12. % device 0: GeForce GTX 1050 Ti with Max-Q Design, 1290.5 MHz clock, 4040.1 MiB memory, capability 6.1 % Sat Jan 18 19:15:28 2020 % Usage: ./testing_ssyevd_gpu [options] [-h|--help]

% jobz = No vectors, uplo = Lower, version = 2 (ssyevdx_gpu) % N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U| %============================================================================ irange (1, 10) 1088 0.1465 0.1789 0.00e+00 --- --- ok irange (1, 10) 2112 0.5888 0.6362 0.00e+00 --- --- ok irange (1, 10) 3136 1.6521 1.1389 1.12e-10 --- --- ok irange (1, 10) 4160 3.9832 1.8834 7.10e-11 --- --- ok irange (1, 10) 5184 7.7379 2.9335 9.81e-10 --- --- ok irange (1, 10) 6208 13.1770 4.5590 1.39e-11 --- --- ok irange (1, 10) 7232 20.3966 6.8185 2.71e-10 --- --- ok irange (1, 10) 8256 29.4747 9.5467 3.87e-10 --- --- ok irange (1, 10) 9280 41.4837 12.6002 6.80e-11 --- --- ok irange (1, 10) 10304 55.2372 16.6859 3.68e-11 --- --- ok — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lebedov/scikit-cuda/issues/289?email_source=notifications&email_token=AAXGXDAHABXKOIE3GWIY573Q6PQTRA5CNFSM4KHBFZP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJKJO6I#issuecomment-575969145, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXGXDDVTFER4NO2EIH745TQ6PQTRANCNFSM4KHBFZPQ.

lanzithinking avatar Jan 19 '20 06:01 lanzithinking

@lanzithinking I think there was one incorrect argument in our interface above (and caused the seg fault): m is an (pure) output and needs to use ctypes.byref() like the info variables. I have fixed it in my issue branch

You can also check out my demo

Below is a sample output for a 4000x4000 matrix (python demo_ssyedx_m.py 4000):

k = 2000, iu = 2000
First 10 eigenvalues
GPU: [-36.561012 -36.410717 -36.17795  -36.082924 -35.971615 -35.826862
 -35.766315 -35.69314  -35.603767 -35.56919 ]
CPU: [-36.560974 -36.41073  -36.178    -36.082973 -35.97165  -35.826904
 -35.76632  -35.69318  -35.60378  -35.569195]
Time
GPU: 2.268606185913086
CPU: 9.111494064331055

I will probably step down from here and slowly add back some codes and tests. Thanks for brining this up, I learned a few things too.

wingkitlee0 avatar Jan 19 '20 18:01 wingkitlee0

@wingkitlee0 Thanks! Great! After fixing that bug. Both ssyevdx and dsyevdx (and also ssyevdx_m and dsyevdx_m) work! New testing results:

CPU eigensolver:

Neig - NumPy.linalg.eig (general, full); Neigh - Numpy.linalg.eigh (symmetric, full) Seigs - Scipy.sparse.linalg.eigs (general, partial); Seigsh - Scipy.sparse.linalg.eigsh (symmetric, partial)

GPU eigensolver:

Keig - scuda.linalg.eig (general, full, cuBLAS/CULA); Xeigs - dsyevdx (symmetric, full, MAGMA)

N=4000, k=100: Neig (CPU) time: 36.76619; Neigh (CPU) time: 4.49761 Seigs (CPU) time: 3.23999; Seigsh (CPU) time: 2.69280 (inconsistent result) Keig (GPU) time: 27.20364 Xeigs (GPU) time: 5.06100

N=4000, k=1000: Neig (CPU) time: 37.00435; Neigh (CPU) time: 4.29299 Seigs (CPU) time: 222.82142; Seigsh (CPU) time: 60.62667 (inconsistent result) Keig (GPU) time: 27.96665 Xeigs (GPU) time: 4.98696

N=4000, k=2000: Neigh (CPU) time: 4.02288 Seigsh (CPU) time: 208.90013 (inconsistent result) Keig (GPU) time: 27.79001 Xeigs (GPU) time: 5.35786

N=10000, k=2000: Neigh (CPU) time: 47.77205 Xeigs (GPU) time: 12.94744

interesting~

magma.py seems not well maintained. Is it used in any function in skcuda?

lanzithinking avatar Jan 20 '20 08:01 lanzithinking

The magma bindings are not really used in skcuda's high-level math functions at the present. This is primarily due to very limited dev time on my part (my current work doesn't require skcuda) and the fact that there still doesn't seem to be a sufficiently good way to automate maintenance of the increasing number of low-level library wrappers in skcuda.

lebedov avatar Jan 20 '20 14:01 lebedov