rocSOLVER
rocSOLVER copied to clipboard
Hybrid cpu host + gpu device execution of bdsqr()
Hybrid cpu host + gpu device execution of bdsqr() in an attempt to speed up SVD calculations.
bdsqr_host() is heavily influenced by lapack routine dbdsqr() where the "D" and "E" arrays are reduced on the CPU. The rotations are then copied to GPU to update the "V", "U", "C" arrays.
A special case is if no rotations are needed for "V", "U", "C" arrays, then the lapack version of bdsqr() is called directly.