Mark Gates
Mark Gates
#111 updated the `iter` documentation for LU (gesv_mixed, etc.). Apply the same changes to posv_mixed, etc.
**Description** On leconte, SVD has bad backward error with 8 ranks / 8 GPUs, for both target host and device. Except backwards error ~ 1e-15. Using 1, 2, 4 ranks...
**Description** From Pieter Ghysels: Yes, both omp_set_max_active_levels and omp_get_max_active_levels are troublesome. The latest versions of ROCm/HIP rely on clang-15, which doesn't support these (even though clang-15 claims to support openmp...
There's #pragma omp critical around tileAcquire in internal_gemmA.cc, but seemingly nowhere else. _Originally posted by @mgates3 in https://github.com/icl-utk-edu/slate/pull/24#discussion_r1145620356_
**Description** In `orhr_col` and `unhr_col`, it says "if NB > N, then N is used instead of NB" and LDT >= max( 1, min( NB, N ) ). However, [lines...
Update to CMake 3.21 HIP language support. Refactor hipify into separate CMake function. Throw CUDA and HIP errors in kernel drivers. TODO: - [ ] Check `hip_arch` flag.
Pulling over an old PR from Bitbucket. See https://bitbucket.org/icl/slate-dev/pull-requests/127 Adds stevx2, openmp parallel-only, one node. `stevx2` is the main routine called by users; `stevx2_bisection` is the parallel reursive subroutine using...
Pulling over an old PR from Bitbucket. See https://bitbucket.org/icl/slate-dev/pull-requests/195 ### Context Dalal Sukkari is improving unmqr which needs a gereduce operation. She added it but in order to have simpler...
Brings SLATE's ScaLAPACK API into alignment with the SLATE style guide. * Wrap lines to 80 chars. * Move template functions before the caller to avoid duplicating the template prototype....