bug: BF_STATUS_DEVICE_ERROR when using a slice
Minimal example:
When operating on sliced bifrost.ndarrays in CUDA space, we have been running into a BF_STATUS_DEVICE_ERROR exception (and BF_STATUS_MEM_OP_FAILED / BF_STATUS_INTERNAL_ERROR).
Here is a minimal example:
import bifrost as bf
import numpy as np
from bifrost.ndarray import copy_array
DEDISP_KERNEL = """
// All inputs have axes (beam, frequency, time)
// input i (the data) has shape 5, 512, 3x8192
// time delay td (the frequency-dependent offset to the first time sample to select) has shape (1, 512, 1)
// Compute o = i shifted by td and averaged by a factor of 1
// The shape of the output is (5, 512, 3x8192)
// we have defined the axis names as t, b, ft, f
o(b, f, ft) = (i(b, 2*f, (ft + td(1, 2*f, 1))) + i(b, 1 + 2*f, (ft + td(1, 1 + 2*f, 1))) ) / 2;
"""
x = np.random.normal(0, 1, (20, 512, 2048)).astype(np.float32)
test = bf.ndarray(x, space = 'cuda')
reduced = bf.ndarray(shape = (5, 256, 128), dtype = np.float32, space = 'cuda')
dedisp = bf.ndarray(shape = (5, 256, 2048), dtype = np.float32, space = 'cuda')
td = bf.ndarray(shape = (1, 512, 1), dtype = np.uint8, space = 'cuda')
for i in range(20):
new_td = np.full((1, 512, 1), 5, dtype = np.uint8)
if i==0:
copy_array(td, new_td)
if i < 15:
bf.map(DEDISP_KERNEL, data={'o': dedisp, 'i': test[i:i+5, :, :], 'td': td}, axis_names = ['b', 'f', 'ft'], shape = (5, 256, 2048))
start = i
stop = i + 512
bf.reduce(dedisp[:, :,start:stop ], reduced, op = 'mean')
A 'BF_STATUS_DEVICE_ERROR' occurs, when the ‘new_td’ is copied to 'td' in cuda space for all ‘i’ values (line 24) and when the reduction factor (line 29) is lower than 8. It works regardless of the copying for reduction factors of 8 and higher.
If new_td is copied to td once, everything works fine. Once new_td is copied more than once (i.e., the data in CUDA space is replaced - even if it is replaced by the same numbers), the exception is raised.
(attempting to access dedisp gives):
In [3]: dedisp
Out[3]: ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
~/mpy3/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
~/mpy3/lib/python3.6/site-packages/IPython/lib/pretty.py in pretty(self, obj)
392 if cls is not object \
393 and callable(cls.__dict__.get('__repr__')):
--> 394 return _repr_pprint(obj, self, cycle)
395
396 return _default_pprint(obj, self, cycle)
~/mpy3/lib/python3.6/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
698 """A pprint that just redirects to the normal repr function."""
699 # Find newlines and replace them with p.break_()
--> 700 output = repr(obj)
701 lines = output.splitlines()
702 with p.group():
~/install/bifrost/python/bifrost/ndarray.py in __repr__(self)
343 return self.copy(space='system')
344 def __repr__(self):
--> 345 return super(ndarray, self._system_accessible_copy()).__repr__()
346 def __str__(self):
347 return super(ndarray, self._system_accessible_copy()).__str__()
~/install/bifrost/python/bifrost/ndarray.py in _system_accessible_copy(self)
341 return self
342 else:
--> 343 return self.copy(space='system')
344 def __repr__(self):
345 return super(ndarray, self._system_accessible_copy()).__repr__()
~/install/bifrost/python/bifrost/ndarray.py in copy(self, space, order)
361 space = self.bf.space
362 # Note: This makes an actual copy as long as space is not None
--> 363 return ndarray(self, space=space)
364 def _key_returns_scalar(self, key):
365 # Returns True if self[key] would return a scalar (i.e., not a view)
~/install/bifrost/python/bifrost/ndarray.py in __new__(cls, base, space, shape, dtype, buffer, offset, strides, native, conjugated)
197 native=base.bf.native,
198 conjugated=conjugated)
--> 199 copy_array(obj, base)
200 else:
201 # Create new array
~/install/bifrost/python/bifrost/ndarray.py in copy_array(dst, src)
109 else:
110 _check(_bf.bfArrayCopy(dst_bf.as_BFarray(),
--> 111 src_bf.as_BFarray()))
112 if dst_bf.bf.space != src_bf.bf.space:
113 # TODO: Decide where/when these need to be called
~/install/bifrost/python/bifrost/libbifrost.py in _check(status)
116 else:
117 status_str = _bf.bfGetStatusString(status)
--> 118 raise RuntimeError(status_str)
119 else:
120 if status == _bf.BF_STATUS_END_OF_DATA:
RuntimeError: b'BF_STATUS_MEM_OP_FAILED'
After some digging I think I know what is going on. It looks like the problem with the dedisp slice in the bf.reduce call is that the memory isn't contiguous along the reduction axis. However, bf.reduce is treating it like it is and tries to launch a vectorized reduction kernel that ends up failing.
The quick fix is to set all of the use_vec#_kernel flags in reduce.cu to false if the input array is not contiguous to force using the non-vectorized loop kernel. That will have some performance impacts on the reduction for anything that is non-contiguous but it should be robust. This fix might be a little heavy handed, too, since it really seems to only be the structure along the reduction axis that matters.
@telegraphic Does slice-with-reduce solve this for you?