Ed D'Azevedo

Results 6 issues of Ed D'Azevedo

Implement recursive formulation of Cholesky factorization for n by n symmetric positive definite matrix A. Let the following be a block partitioning of matrix A. Here submatrix L22 is n/2...

noOptimizations

Optimize getrf_npvt (LU factorization without pivoting) by using a block algorithm that is similar to the block algorithm used in Cholesky factorization. The diagonal block is factored using a specialized...

Hybrid cpu host + gpu device execution of bdsqr() in an attempt to speed up SVD calculations. bdsqr_host() is heavily influenced by lapack routine dbdsqr() where the "D" and "E"...

noOptimizations

Here is an attempt to optimize latrd by storing the 2 narrow column panels "A" and "W" in LDS shared memory and using Cooperative Kernel Launch to synchronize all thread...

Code modifications to sytrd() and latrd() to use gemv() (general matrix vector multiply) instead of symv() (symmetric matrix vector multiply). In some implementations and problem sizes, gemv() may give higher...

noOptimizations

The kernels (and launch configurations) are modified to accept a block size nb_max that is not equal to BS2=32. For double_complex type, the nb_max can be set to 22 to...

noOptimizations
ci:no-ccache