tobac
tobac copied to clipboard
Bulk statistics very slow for non-contiguous array data
I've recently noticed that bulk statistics can run very slowly when applied to data that is non-contiguous. This can happen when slicing dask arrays or broadcasting along the trailing dimension. Calling ravel
on these arrays is ~20x slower, which, as we do this for each feature, adds up to a big slowdown. I might look into smarter ways of doing this in future to address this issue
Using np.split
might be a fast approach, as shown in https://stackoverflow.com/a/43094244