arrayfire-python
arrayfire-python copied to clipboard
Better error checking during initialization
Today I came across the following cryptic error message when trying to use arrayfire-python:
In [1]: import arrayfire
In [2]: arrayfire.backend.name()
Out[2]: 'cuda'
In [5]: arrayfire.Array()
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-5-6dffbbbc7abd> in <module>()
----> 1 arrayfire.Array()
/home/filipe/.local/lib/python2.7/site-packages/arrayfire-3.2.20151211-py2.7.egg/arrayfire/array.pyc in __init__(self, src, dims, dtype, is_device)
422 for n in range(numdims):
423 idims[n] = dims[n]
--> 424 self.arr = _create_empty_array(numdims, idims, to_dtype[type_char])
425
426 def as_type(self, ty):
/home/filipe/.local/lib/python2.7/site-packages/arrayfire-3.2.20151211-py2.7.egg/arrayfire/array.pyc in _create_empty_array(numdims, idims, dtype)
36 c_dims = dim4(idims[0], idims[1], idims[2], idims[3])
37 safe_call(backend.get().af_create_handle(ct.pointer(out_arr),
---> 38 numdims, ct.pointer(c_dims), dtype.value))
39 return out_arr
40
/home/filipe/.local/lib/python2.7/site-packages/arrayfire-3.2.20151211-py2.7.egg/arrayfire/util.pyc in safe_call(af_error)
73 err_len = ct.c_longlong(0)
74 backend.get().af_get_last_error(ct.pointer(err_str), ct.pointer(err_len))
---> 75 raise RuntimeError(to_str(err_str), af_error)
76
77 def get_version():
RuntimeError: ('Error in /var/lib/jenkins-slave/workspace/arrayfire-linux-mkl-graphics-installer/src/api/c/data.cpp(197):\n\n\n', 998)
The error was caused by a bad driver version:
$ nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system
I think it could be useful if arrayfire would test if a device is really available, instead of just checking if it can dlopen the relevant library. For example by calling arrayfire.Array()
. If that would fail a more useful error could be returned to the user.
Just curious, were you installing the drivers via your package manager or a runfile installer? The CUDA toolkit comes in both formats and this error can happen when you mix/match these methods during an upgrade.
The error happened because yum
automatically upgraded the CUDA driver. But yum
doesn't remove and reinsert the nvidia
and nvidia_uvm
modules, causing the above error. This was on CentOS 6.
Ah, the woes of package managers! It's hard to cover all of the bases but this disappointing news is very hard to hear. I have never been a fan of yum and the way it handles packages, but mediocrity often finds safety in standardization...
@FilipeMaia https://github.com/arrayfire/arrayfire-python/commit/d8db269942e92a9eeab482d1dc5b3d6788e31122 changed the behavior to pick the first working backend instead of the first backend that can be loaded.
This is fundamentally different behavior than what is expected here. We could add a function to display debug info that will tell you if a library was found but can not be run for whatever reason.
For example:
>>> af.show_backends()
CPU: Loaded. 1 device found.
CUDA: Loaded. No devices found.
OpenCL: Loaded. 3 devices found.
Would that be a good alternative?
I think that's a good solution.
I think this needs to be implemented upstream.