arrayfire-python Better error checking during initialization

Better error checking during initialization

Open FilipeMaia opened this issue 8 years ago • 6 comments

Today I came across the following cryptic error message when trying to use arrayfire-python:


In [1]: import arrayfire
In [2]: arrayfire.backend.name()
Out[2]: 'cuda'
In [5]: arrayfire.Array()
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-5-6dffbbbc7abd> in <module>()
----> 1 arrayfire.Array()

/home/filipe/.local/lib/python2.7/site-packages/arrayfire-3.2.20151211-py2.7.egg/arrayfire/array.pyc in __init__(self, src, dims, dtype, is_device)
    422             for n in range(numdims):
    423                 idims[n] = dims[n]
--> 424             self.arr = _create_empty_array(numdims, idims, to_dtype[type_char])
    425
    426     def as_type(self, ty):

/home/filipe/.local/lib/python2.7/site-packages/arrayfire-3.2.20151211-py2.7.egg/arrayfire/array.pyc in _create_empty_array(numdims, idims, dtype)
     36     c_dims = dim4(idims[0], idims[1], idims[2], idims[3])
     37     safe_call(backend.get().af_create_handle(ct.pointer(out_arr),
---> 38                                              numdims, ct.pointer(c_dims), dtype.value))
     39     return out_arr
     40

/home/filipe/.local/lib/python2.7/site-packages/arrayfire-3.2.20151211-py2.7.egg/arrayfire/util.pyc in safe_call(af_error)
     73         err_len = ct.c_longlong(0)
     74         backend.get().af_get_last_error(ct.pointer(err_str), ct.pointer(err_len))
---> 75         raise RuntimeError(to_str(err_str), af_error)
     76
     77 def get_version():

RuntimeError: ('Error in /var/lib/jenkins-slave/workspace/arrayfire-linux-mkl-graphics-installer/src/api/c/data.cpp(197):\n\n\n', 998)

The error was caused by a bad driver version:

$ nvidia-smi
Failed to initialize NVML: GPU access blocked by the operating system

I think it could be useful if arrayfire would test if a device is really available, instead of just checking if it can dlopen the relevant library. For example by calling arrayfire.Array(). If that would fail a more useful error could be returned to the user.

Dec 14 '15 16:12 FilipeMaia

Just curious, were you installing the drivers via your package manager or a runfile installer? The CUDA toolkit comes in both formats and this error can happen when you mix/match these methods during an upgrade.

Dec 16 '15 15:12 PythonProdigy

The error happened because yum automatically upgraded the CUDA driver. But yum doesn't remove and reinsert the nvidia and nvidia_uvm modules, causing the above error. This was on CentOS 6.

Dec 16 '15 15:12 FilipeMaia

Ah, the woes of package managers! It's hard to cover all of the bases but this disappointing news is very hard to hear. I have never been a fan of yum and the way it handles packages, but mediocrity often finds safety in standardization...

Dec 16 '15 15:12 PythonProdigy

@FilipeMaia https://github.com/arrayfire/arrayfire-python/commit/d8db269942e92a9eeab482d1dc5b3d6788e31122 changed the behavior to pick the first working backend instead of the first backend that can be loaded.

This is fundamentally different behavior than what is expected here. We could add a function to display debug info that will tell you if a library was found but can not be run for whatever reason.

For example:

>>> af.show_backends()
CPU: Loaded. 1 device found.
CUDA: Loaded. No devices found.
OpenCL: Loaded. 3 devices found.

Would that be a good alternative?

Sep 24 '16 06:09 pavanky

I think that's a good solution.

Sep 25 '16 09:09 FilipeMaia

I think this needs to be implemented upstream.

Jul 18 '17 08:07 pavanky

arrayfire-python arrayfire-python copied to clipboard

Better error checking during initialization

arrayfire-python
arrayfire-python copied to clipboard