arrayfire-python
arrayfire-python copied to clipboard
runtime error in 'bench_cg.py'
Hello,
I'm getting a runtime error running the bench_cg.py example:
RuntimeError: In function std::vector
Also, the af.info() seems to see the GPU device, but doesn't assign it an ID, not could detect its memory and compute capability. At the same time, the bench_fft.py seems to work.
(venv3) igu@demeter:~$ nvidia-smi
Fri Apr 19 17:45:53 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.39 Driver Version: 418.39 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 980 Off | 00000000:02:00.0 Off | N/A |
| 30% 42C P0 45W / 195W | 0MiB / 4040MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
(venv3) igu@demeter:~$ export | grep LD_LILI
declare -x LD_LIBRARY_PATH=":/usr/local/cuda/lib64:/opt/arrayfire/lib64:/usr/local/cuda/nvvm/lib64"
(venv3) igu@demeter:~$ python /mnt/Simulations/IvanG/WORKSPACE/arrayfire_examples/examples/benchmarks/bench_fft.py
ArrayFire v3.6.2 (CUDA, 64-bit Linux, build dc38ef1)
Platform: CUDA Toolkit , Driver: 418.39
[] GeForce GTX 980, MB, CUDA Compute .
Benchmark N x N 2D fft on arrayfire
Time taken for 128 x 128: 0.9280 Gflops
Time taken for 256 x 256: 174.0147 Gflops
Time taken for 512 x 512: 300.4586 Gflops
Time taken for 1024 x 1024: 298.9022 Gflops
Time taken for 2048 x 2048: 287.5326 Gflops
Time taken for 4096 x 4096: 309.7844 Gflops
Benchmark N x N 2D fft on numpy
Time taken for 128 x 128: 4.4689 Gflops
Time taken for 256 x 256: 4.8368 Gflops
Time taken for 512 x 512: 4.4340 Gflops
Time taken for 1024 x 1024: 3.8707 Gflops
(venv3) igu@demeter:~$ python /mnt/Simulations/IvanG/WORKSPACE/arrayfire_examples/examples/benchmarks/bench_cg.py
ArrayFire v3.6.2 (CUDA, 64-bit Linux, build dc38ef1)
Platform: CUDA Toolkit , Driver: 418.39
[] GeForce GTX 980, MB, CUDA Compute .
Testing benchmark functions...
Traceback (most recent call last):
File "/mnt/Simulations/IvanG/WORKSPACE/arrayfire_examples/examples/benchmarks/bench_cg.py", line 199, in <module>
test()
File "/mnt/Simulations/IvanG/WORKSPACE/arrayfire_examples/examples/benchmarks/bench_cg.py", line 141, in test
A, b, x0 = setup_input(n=50, sparsity=7) # dense A
File "/mnt/Simulations/IvanG/WORKSPACE/arrayfire_examples/examples/benchmarks/bench_cg.py", line 51, in setup_input
A = A.T + A + n*af.identity(n, n, dtype=af.Dtype.f32)
File "/opt/venv3/lib/python3.6/site-packages/arrayfire/array.py", line 664, in T
return transpose(self, False)
File "/opt/venv3/lib/python3.6/site-packages/arrayfire/array.py", line 318, in transpose
safe_call(backend.get().af_transpose(c_pointer(out.arr), a.arr, conj))
File "/opt/venv3/lib/python3.6/site-packages/arrayfire/util.py", line 79, in safe_call
raise RuntimeError(to_str(err_str))
RuntimeError: In function std::vector<char> cuda::compileToPTX(const char*, std::string)
In file src/backend/cuda/jit.cpp:
(venv3) igu@demeter:~$
Also, setting the backend doesn't seem to have an effect:
>>> af.get_backend()
'unified'
>>> af.set_backend('cuda')
>>> af.get_backend()
'unified'
Thank you, Ivan