arrayfire icon indicating copy to clipboard operation
arrayfire copied to clipboard

[BUG] LAPACKE Error when using solve on OpenCL backend

Open villekf opened this issue 3 years ago • 2 comments

Using solve causes various LAPACKE errors when using host input data.

Description

Using the latest AF binary installer for Windows. This issue is only present with the OpenCL backend, both CUDA and CPU work fine. For NVIDIA GPU, the Lapacke error is thrown (see below), for Intel CPU runtime, it simply crashes. I've (finally) managed to reliably reproduce it on my end with the code shown below. The issue seems to be the host data, since values created with AF work just fine. Originally I ran into the problem when using a sparse matrix that was created from host side row, column and non-zero data, the error numbers were also different (127 or 128). However, the issue was also present when using dense/regular data.

CMD for GPU:

[platform][1623750522][014456] [ ..\src\backend\common\DependencyModule.cpp(99) ] Attempting to load: forge.dll
[platform][1623750522][014456] [ ..\src\backend\common\DependencyModule.cpp(102) ] Found: forge.dll
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(218) ] Found 3 OpenCL platforms
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(230) ] Found 1 devices on platform NVIDIA CUDA
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(235) ] Found device GeForce GTX TITAN on platform NVIDIA CUDA
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(230) ] Found 1 devices on platform Intel(R) OpenCL
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(235) ] Found device Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz on platform Intel(R) OpenCL
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(230) ] Found 1 devices on platform Intel(R) CPU Runtime for OpenCL(TM) Applications
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(235) ] Found device Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz on platform Intel(R) CPU Runtime for OpenCL(TM) Applications
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(240) ] Found 3 OpenCL devices
[platform][1623750522][014456] [ ..\src\backend\opencl\device_manager.cpp(335) ] Default device: 0
ArrayFire v3.8.0 (OpenCL, 64-bit Windows, build d99887a)
[0] NVIDIA: GeForce GTX TITAN, 6144 MB
-1- INTEL: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz, 32691 MB
-2- INTEL: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz, 32691 MB
[mem][1623750522][014456] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 16 MB 0x17d1286e930
[mem][1623750522][014456] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 1 GB 0x17d1286f7a0
[jit][1623750522][014456] [ ..\src\backend\opencl\compile_module.cpp(254) ] {9348653917523335434  : loaded from C:\Users\user\AppData\Local\Temp\\ArrayFire\KER9348653917523335434_CL_4318_GEFORCE_GTX_TITAN_AF_38.bin for GeForce GTX TITAN }
[mem][1623750522][014456] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 16 MB 0x17d1286d8b0
[mem][1623750523][014456] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 16 MB 0x17d12870610
[jit][1623750523][014456] [ ..\src\backend\opencl\compile_module.cpp(254) ] {17823172283485866653 : loaded from C:\Users\user\AppData\Local\Temp\\ArrayFire\KER17823172283485866653_CL_4318_GEFORCE_GTX_TITAN_AF_38.bin for GeForce GTX TITAN }
[mem][1623750523][014456] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 256 KB 0x17d1286fdd0
[mem][1623750523][014456] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 256 KB 0x17d1286f170
[jit][1623750523][014456] [ ..\src\backend\opencl\compile_module.cpp(254) ] {59319625355602817    : loaded from C:\Users\user\AppData\Local\Temp\\ArrayFire\KER59319625355602817_CL_4318_GEFORCE_GTX_TITAN_AF_38.bin for GeForce GTX TITAN }
In function int __cdecl magma_getrf_gpu<float>(int,int,struct _cl_mem *,unsigned __int64,int,int *,struct _cl_command_queue *,int *)
In file src\backend\opencl\magma\getrf.cpp:235
LAPACKE Error (1)
 0# af::operator>= in afopencl
 1# af::operator>= in afopencl
 2# af::operator>= in afopencl
 3# af::operator>= in afopencl
 4# af::operator>= in afopencl
 5# af::operator>= in afopencl
 6# main at C:\Program Files\ArrayFire\v3\examples\helloworld\helloworld.cpp:33
 7# __scrt_common_main_seh at D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
 8# BaseThreadInitThunk in KERNEL32
 9# RtlUserThreadStart in ntdll

ArrayFire Exception (Internal error:998):
In function int __cdecl magma_getrf_gpu<float>(int,int,struct _cl_mem *,unsigned __int64,int,int *,struct _cl_command_queue *,int *)
In file src\backend\opencl\magma\getrf.cpp:235
LAPACKE Error (1)
 0# af::operator>= in afopencl
 1# af::operator>= in afopencl
 2# af::operator>= in afopencl
 3# af::operator>= in afopencl
 4# af::operator>= in afopencl
 5# af::operator>= in afopencl
 6# main at C:\Program Files\ArrayFire\v3\examples\helloworld\helloworld.cpp:33
 7# __scrt_common_main_seh at D:\agent\_work\9\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl:288
 8# BaseThreadInitThunk in KERNEL32
 9# RtlUserThreadStart in ntdll

In function class af::array __cdecl af::solve(const class af::array &,const class af::array &,const af_mat_prop)
In file src\api\cpp\lapack.cpp:89

CPU:

[platform][1623751315][010748] [ ..\src\backend\common\DependencyModule.cpp(99) ] Attempting to load: forge.dll
[platform][1623751315][010748] [ ..\src\backend\common\DependencyModule.cpp(102) ] Found: forge.dll
[platform][1623751315][010748] [ ..\src\backend\opencl\device_manager.cpp(218) ] Found 3 OpenCL platforms
[platform][1623751315][010748] [ ..\src\backend\opencl\device_manager.cpp(230) ] Found 1 devices on platform NVIDIA CUDA
[platform][1623751315][010748] [ ..\src\backend\opencl\device_manager.cpp(235) ] Found device GeForce GTX TITAN on platform NVIDIA CUDA
[platform][1623751315][010748] [ ..\src\backend\opencl\device_manager.cpp(230) ] Found 1 devices on platform Intel(R) OpenCL
[platform][1623751315][010748] [ ..\src\backend\opencl\device_manager.cpp(235) ] Found device Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz on platform Intel(R) OpenCL
[platform][1623751315][010748] [ ..\src\backend\opencl\device_manager.cpp(230) ] Found 1 devices on platform Intel(R) CPU Runtime for OpenCL(TM) Applications
[platform][1623751315][010748] [ ..\src\backend\opencl\device_manager.cpp(235) ] Found device Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz on platform Intel(R) CPU Runtime for OpenCL(TM) Applications
[platform][1623751315][010748] [ ..\src\backend\opencl\device_manager.cpp(240) ] Found 3 OpenCL devices
[platform][1623751316][010748] [ ..\src\backend\opencl\device_manager.cpp(335) ] Default device: 0
ArrayFire v3.8.0 (OpenCL, 64-bit Windows, build d99887a)
-0- NVIDIA: GeForce GTX TITAN, 6144 MB
-1- INTEL: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz, 32691 MB
[2] INTEL: Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz, 32691 MB
[mem][1623751316][010748] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 16 MB 0x1e334bd8610
[mem][1623751316][010748] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 1 GB 0x1e334bca300
[jit][1623751316][010748] [ ..\src\backend\opencl\compile_module.cpp(254) ] {9348653917523335434  : loaded from C:\Users\user\AppData\Local\Temp\\ArrayFire\KER9348653917523335434_CL_32902_INTEL(R)_CORE(TM)_I7-5820K_CPU_@_3.30GHZ_AF_38.bin for Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz }
[mem][1623751316][010748] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 16 MB 0x1e334e78b10
[mem][1623751316][010748] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 16 MB 0x1e334e79ec0
[jit][1623751316][010748] [ ..\src\backend\opencl\compile_module.cpp(254) ] {17823172283485866653 : loaded from C:\Users\user\AppData\Local\Temp\\ArrayFire\KER17823172283485866653_CL_32902_INTEL(R)_CORE(TM)_I7-5820K_CPU_@_3.30GHZ_AF_38.bin for Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz }
[mem][1623751316][010748] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 256 KB 0x1e334e79380
[mem][1623751316][010748] [ ..\src\backend\opencl\memory.cpp(200) ] nativeAlloc: 256 KB 0x1e334e78de0

Reproducible Code and/or Steps

The following code reproduces the issue with error 1. The dimensions are from the original data.

#include <arrayfire.h>
#include <cstdio>
#include <cstdlib>

using namespace af;

int main(int argc, char* argv[]) {
    try {
        // Select a device and display arrayfire info
        int device = argc > 1 ? atoi(argv[1]) : 0;
        af::setDevice(device );
        af::info();

        array KG, HH;

        std::vector<float> joku(16384 * 256, 0.f);
        array S           = array(256, 16384, joku.data());
        array Pplus = randu(16384, 16384, f32);
        KG = (matmul(S, (Pplus)));
        HH = transpose(matmul(S, transpose(KG)));
        KG = solve(HH, (KG));
        eval(KG);

    } catch (af::exception& e) {
        fprintf(stderr, "%s\n", e.what());
        throw;
    }

    return 0;
}

System Information

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 456.71       Driver Version: 456.71       CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX TITAN  WDDM  | 00000000:03:00.0  On |                  N/A |
| 24%   41C    P8    14W / 300W |    689MiB /  6144MiB |      2%      Default |
+-------------------------------+----------------------+----------------------+

Checklist

  • [x] Using the latest available ArrayFire release
  • [ ] GPU drivers are up to date

villekf avatar Jun 15 '21 10:06 villekf

Same issue is also present when using inverse instead, e.g. KG = matmul(inverse(HH), KG);

villekf avatar Jun 15 '21 11:06 villekf

I wonder if all LAPACK routines are having issues with Telsa M60 - #3147

9prady9 avatar Jun 21 '21 11:06 9prady9