hoki [DOCS] Using 'numpy.reshape' to create flat arrays for voronoi binning will affect performance

[DOCS] Using 'numpy.reshape' to create flat arrays for voronoi binning will affect performance

Open Knusper opened this issue 8 months ago • 0 comments

Describe the issue with the content

In the Voronoi binning example new views of the 2D image and noise arrays are created by using numpy.reshape. This can (and does) cause overhead in the numpy.ravel operations, as the reshaped arrays are just different "views" of the same data. The vorbin package by M. Cappellari makes use of numpy.ravel and therefore the execution of the algorithm will require significant more ram and will take significant more time. See, e.g., this stackoverflow thread for a description of the problem: https://stackoverflow.com/questions/66296162/numpy-ravel-takes-too-long-after-a-slight-change-to-a-ndarray

Location of the issue in the documentation

https://heloises.github.io/hoki/Voronoi_binning_of_SED_data.html#Calculating-SNR-in-a-line-free-region

Suggestion to improve or clear confusion

After getting bitten by this and banging my head against the wall for a day, I found two solutions. The first one is using a mask. Since we can't pass 'NaNs' to vorbin anyway, we could proceed the following way (im is the image and var_im are the associated variances):

mask = ~np.isnan(im)
Y, X = np.indices(im.shape)
x = X[mask]
y = Y[mask]

# input arays for vorbin
signal_array = im[mask]
noise_array = np.sqrt(var_im[mask])

# summoning vorbin
bin_num, x_node, y_node, x_bar, y_bar, sn, n_pix, scale = \
    voronoi_2d_binning(x, y, signal_array, noise_array, target_snr, plot=plot,
                       pixelsize=1., wvt=False)

One can also alter the shape of an array in-place. The downside for a lot of applications is, that the order in the array is not as well defined as it is the case with reshape . However, in our case this does not matter, as the order will be consistent for each array. This solution, which appears slightly less elegant, reads:

# book-keeping of pixel coordinates
Y, X = np.indices(im.shape)
x = X.copy()
y = Y.copy()
x.shape = (x.shape[0] * x.shape[1])
y.shape = x.shape

# input arays for vorbin
signal_array = im.copy()
noise_array = np.sqrt(var_im)
signal_array.shape = x.shape
noise_array.shape = x.shape

# summoning vorbin
bin_num, x_node, y_node, x_bar, y_bar, sn, n_pix, scale = \
    voronoi_2d_binning(x, y, signal_array, noise_array, target_snr, plot=False,
                       pixelsize=1., wvt=False)

Jun 19 '24 18:06 Knusper

hoki hoki copied to clipboard

[DOCS] Using 'numpy.reshape' to create flat arrays for voronoi binning will affect performance

hoki
hoki copied to clipboard