Division between int-gpuarray and int scalar does not work
Consider:
import pycuda.autoinit
from pycuda import gpuarray
x = gpuarray.to_gpu(np.arange(4))
x / 4
This will give the following exception:
error Traceback (most recent call last)
<ipython-input-1-c5776fbf102a> in <module>()
3
4 x = gpuarray.to_gpu(np.arange(4))
----> 5 x / 4
/usr/local/lib/python3.4/dist-packages/pycuda-2015.1.2-py3.4-linux-x86_64.egg/pycuda/gpuarray.py in __div__(self, other)
476 # create a new array for the result
477 result = self._new_like_me(_get_common_dtype(self, other))
--> 478 return self._axpbz(1/other, 0, result)
479
480 __truediv__ = __div__
/usr/local/lib/python3.4/dist-packages/pycuda-2015.1.2-py3.4-linux-x86_64.egg/pycuda/gpuarray.py in _axpbz(self, selffac, other, out, stream)
319 func.prepared_async_call(self._grid, self._block, stream,
320 selffac, self.gpudata,
--> 321 other, out.gpudata, self.mem_size)
322
323 return out
/usr/local/lib/python3.4/dist-packages/pycuda-2015.1.2-py3.4-linux-x86_64.egg/pycuda/driver.py in function_prepared_async_call(func, grid, block, stream, *args, **kwargs)
506
507 from pycuda._pvt_struct import pack
--> 508 arg_buf = pack(func.arg_format, *args)
509
510 for texref in func.texrefs:
error: required argument is not an integer
As far as I understand, the problem is the following: within __div__, it is assumed that the output has the same dtype as the input, but this is not the case when dividing an integer array by a scalar. This could be fixed easily by making sure that division always returns a floating point type.
As a sidenote, a related problem is that // (the 'integer division' from Py3) is not yet implemented in gpuarray. As far as I can see, the "turn the division into a multiplication" does not work for that operator, so maybe the division-mechanism should be entirely switched to something other than axpbz.
I'd be happy to take a (tested) patch of either issue, but I don't have the time to work on this myself.