hyperspy icon indicating copy to clipboard operation
hyperspy copied to clipboard

`_map_all` with lost axes etc

Open CSSFrancis opened this issue 3 years ago • 0 comments

Describe the bug

It might be good to redefine the purpose of the _map_all function in hyperspy.

(My understanding) Is that the intended purpose is to be able to call some function which operates on a defined set of axes rather than just operating on the signal axes. It is currently called when and axes variable is passed to a map function. For the most part I would think these functions are mostly a chain of numpy/dask function.

An example might be something like the variance which we can easily define using only numpy functions.

def variance (data, axes):
    return np.mean(data**2,axis = axes) - np.mean(data, axis=axes)**2

As of now there are a couple of issues with the function, namely it doesn't handle dropping axes. Beyond that there isn't really a reason for both the _map_all function and _apply_function_on_data_and_remove_axis.

To Reproduce

hs.signals.Signal2D(np.ones((3,4,5))).map(variance,axes=(1,2)) #fails

This is kind of a stupid example but I think that it shows there is some odd behavior.

Expected behavior

In this case the signal.get_dimensions_from_data() function is causing things to fail. It is an easy enough fix but it might be good to consider how this could be implemented better

I think the concept of having some map functions which operate on the whole dataset is a good idea. The COM function in pyxem is an example of this https://github.com/pyxem/pyxem/pull/845 and I would imagine there are other cases where this might be helpful as well.

The question is how to best implement this. I would suggest just having the _map_all function call _apply_function_on_data_and_remove_axis or removing _map_all entirely.

It might also be good call _map_all whenever an axis keyword is called. In that case something like s.map(np.sum, axis=1) would work.

That being said, the best case might just be to make sure that the duck_typing from numpy is working correctly and variance(s, axis=1) works and returns a signal.

In that case it would seem like both the _map_all and _apply_function_on_data_and_remove_axis would be mostly deprecated.

CSSFrancis avatar Jun 01 '22 19:06 CSSFrancis