neon
neon copied to clipboard
Dtype issues with gpu backend
Hello, I was experimenting with Neon and had faced an issue with the convolutional and pooling layers. The task was image classification, so the input data shape was (3, H, W). If an ArrayIterator
or HDF5Iterator
are used as datasets, then the input shape values might have numpy
datatypes like numpy.int64
(for ArrayIterator
it is provided by lshape
parameter, for HDF5Iterator
they are retrieved from file['input'].attrs['lshape']
). When these values are passed to the model configure
method as in_obj
, they are assigned to the layer.in_shape
. After this, in_shape
is used to initialize layer parameters. Next, during the forward pass, the following errors arise:
- conv layer:
File "<user>/neon/backends/nervanagpu.py", line 1990, in fprop_conv
return self._execute_conv("fprop", layer, layer.fprop_kernels, repeat)
File "<user>/neon/backends/nervanagpu.py", line 2072, in _execute_conv
kernels.execute(repeat)
File "<user>/neon/backends/convolution.py", line 224, in execute
kernel.prepared_async_call(*self.launch_args, shared_size=self.shared)
File "<user>/pycuda-2017.1.1-py3.5-linux-x86_64.egg/pycuda/driver.py", line 516, in function_prepared_async_call
func._launch_kernel(grid, block, arg_buf, shared_size, stream)
TypeError: No registered converter was able to produce a C++ rvalue of type unsigned int from this Python object of type numpy.int64
- pool layer:
File "<user>/neon/backends/nervanagpu.py", line 2316, in fprop_pool
layer.fprop_lut_size, repeat)
File "<user>/neon/backends/nervanagpu.py", line 2349, in _execute_pool
kernel.prepared_async_call(*params, shared_size=shared)
File "<user>/pycuda-2017.1.1-py3.5-linux-x86_64.egg/pycuda/driver.py", line 516, in function_prepared_async_call
func._launch_kernel(grid, block, arg_buf, shared_size, stream)
TypeError: No registered converter was able to produce a C++ rvalue of type unsigned int from this Python object of type numpy.int64
- memory allocation in conv:
File "<user>/neon/backends/convolution.py", line 1307, in bind_params
input_data = self.lib.scratch_buffer_offset(self.size)
File "<user>/neon/backends/nervanagpu.py", line 875, in scratch_buffer_offset
data = int(_get_scratch_data(self.scratch_size)) + self.scratch_offset
File "<decorator-gen-62>", line 2, in _get_scratch_data
File "<user>/pycuda-2017.1.1-py3.5-linux-x86_64.egg/pycuda/tools.py", line 430, in context_dependent_memoize
result = func(*args)
File "<user>/neon/backends/nervanagpu.py", line 3287, in _get_scratch_data
return drv.mem_alloc(scratch_size)
Boost.Python.ArgumentError: Python argument types in
pycuda._driver.mem_alloc(numpy.int64)
did not match C++ signature:
mem_alloc(unsigned long)
Layer parameters:
In "<>/neon/backends/convolution.py", line 75, in __init__:
(N, C, K, D, H, W, T, R, S, M, P, Q, pad_d, pad_h, pad_w, str_d, str_h, str_w, dil_d, dil_h, dil_w)
Have following values (idx, type, value):
[(0, <class 'int'>, 128), (1, <class 'numpy.int64'>, 3), (2, <class 'int'>, 32), (3, <class 'int'>, 1), (4, <class 'numpy.int64'>, 128), (5, <class 'numpy.int64'>, 128), (6, <class 'int'>, 1), (7, <class 'int'>, 3), (8, <class 'int'>, 3), (9, <class 'int'>, 1), (10, <class 'numpy.int64'>, 128), (11, <class 'numpy.int64'>, 128), (12, <class 'int'>, 0), (13, <class 'int'>, 2), (14, <class 'int'>, 2), (15, <class 'int'>, 1), (16, <class 'int'>, 1), (17, <class 'int'>, 1), (18, <class 'int'>, 1), (19, <class 'int'>, 2), (20, <class 'int'>, 2)]
Casting all parameters to int
in layer initialization fixes the issue for me, but it seems not like a proper solution. Casting elements of lshape
to int
also helps. I think it would be great if the input values be checked or be converted to the expected types on the library side. Other layer types (like linear, batchnorm, recurrent, etc.) and backends (cpu, mkl) which I had used, had not shown to suffer from this issue.
Environment: python 3.5.2, neon 2.6.0 (f9d771bbb5f5fa3ae129748596d0ced5389c7f88), cuda 8.0, gpu K40s, ubuntu 16.04, boost 1.58.0, pycuda 2017.1.1, numpy 1.13.1.
@zhiltsov-max Agreed. A type check is needed here.