ndcube
ndcube copied to clipboard
Adds the functionality of Ufunc to NDCube
PR Description This PR address the issue: #591
I've tried to add the support of Arithmetic operations to NDCube using Numpy Ufunc
I have removed the test cases in ndcube/tests/test_ndcube.py
in the arithmetic section which doesn't support __array_ufunc__
Note: Before proceeding to add the remaining test cases and Changelog, I wanted to ensure that the proof of concept is correct.
Thanks @ViciousEagle03 for this PR and for opening it earlier rather than later so we can discuss the approach.
Firstly, I'll say that I'm not an expert in ufuncs so it's possible I've misunderstood some details in your implementation. Secondly, despite my concerns below, I think supporting ufuncs is a useful feature for users. The question is how to do this, and this PR is a great chance to discuss it.
My understanding is that the API you're implementing for ufuncs is different to that for the arithmetic operators. The ufuncs basically extract the data array and then proceed as normal, whereas the arithmetic operations try to take the uncertainties and mask into account, where appropriate. For example, cube + np.ones(cube.data.shape)
will return an NDCube
where all elements in the data array have been incremented by 1 (same WCS, uncertainties, mask, etc.), whereas np.add(cube, np.ones(cube.data.shape))
will return a numpy array giving the result of cube.data + np.ones(cube.data.shape)
. Is this correct?
Assuming that it is, this opens an important discussion. My intuition as a user would be that np.add
and +
should behave the same when adding an array to an NDCube
. By this I mean both should consider the WCS, uncertainties and mask if and where appropriate, and return an NDCube
. If users are only interested in the data array without consideration of the WCS, uncertainty and mask, a simple and explicit API is already available, namely np.add(cube.data, np.ones(cube.data.shape))
. I would argue the same applies to np.subtract
and np.multiply
. What do you think? Is there another perspective I'm not considering?
If we continue to assume that ufuncs should also consider the wcs, mask and uncertainty, it has implications for np.maximum
, np.equal
as well. Specifically to your question about whether np.equal
should take into account the wcs, I would argue yes. In fact this is a major reason it has not yet been implemented. Determining whether two APE-14 wcs's are equivalent is not always trivial, or it can be computationally expensive. Another consideration is whether values in the data array should be the same, or just have overlapping uncertainty ranges. Furthermore, should masked values be excluded from the calculation? I would argue yes. But these are questions that we haven't agreed a consensus to.
As for np.maximum
, I gave this some thought a while back and asked around to get a sense of what users' intuitions would be. Again, the handling of uncertainties means that the behaviour is ambiguous. Should the maximum be the maximum value in the data array? Or the maximum of data + uncertainty
? Or even the maximum of data - uncertainty
? Perhaps it should return both the maximum value and the uncertainty associated with that. And the uncertainty could be asymmetric if drawn from multiple elements in the NDCube
. I realised that there wasn't an obvious, unified expectation from users, and so we decided not to implement maximum
. But that isn't to say that it shouldn't be. But it is more complicated than simply np.maximum(cube.data))
.
Fundamentally, the issue lies in how to treat the wcs, mask and uncertainty, whether it be arithmetic operators or ufuncs. If these are going to be ignored and a numpy array returned, users can easily perform their operation on cube.data
, and there's no real need to implement operators on NDCube
.
What are your thoughts? And have I misunderstood your implementation?
Hey @DanRyanIrish, Thanks for the feedback
I think both the ufunc and arithmetic operators APIs should provide the same implementation for the ufuncs and arithmetic operators and so I have changed the __array_ufunc__
method to take in account the same.The ufuncs now also considers the uncertainties and mask where necessary.
Regarding the implementation of np.equal
ufunc, I think it is ambiguous to define the implementation of it
Given the computational expense of comparing the WCSes of two NDCube ,if one is only comparing the data array of the cubes then it can be achieved by just np.equal(cube1.data, cube2.data)
or if equal means comparing the reference of both the cubes then we can check it by is
.So, you are right to first establish what the np.equal should achieve and if you have any particular implementation you would like me to add to np.equal
, I would be happy to change the patch accordingly.
I'm a bit uncertain about np.maximum
,
As far as I understand it, np.maximum typically operates as a binary ufunc, changing the convention of np.maximum
could be misleading or confusing, Or did I misunderstood the above?
And if we implement the np.maximum
, we are returning the two things below, right?
- the maximum value in the NDCube.data array
- the maximum of the data array and the corresponding uncertainty value for each corresponding data
Hey @DanRyanIrish,
Apologies for the delay, I have introduced the kwarg dunder
to handle the operator overloading properly and to leverage the existing python operators. Since there is little documentation available regarding the implementation of array_ufunc
functionality for complex custom objects so the way i have implemented it might be a little confusing so i will try to explain the API I have implemented here,
If array2 supports array_ufunc
• Calling np.add(cube1, array2)
would return an NDCube.
• Attempting np.add(cube1, cube2)
would return NotImplented
, triggering a TypeError
.
• Calling np.add(array1, cube2)
would also return an NDCube.
By default, ufuncs
triggers the array_ufunc mechanism, while arithmetic operators invoke the respective dunder methods.
To ensure both ufuncs and arithmetic operators account for masks and uncertainties appropriately and follow the same API, I've used dunder to specify when an arithmetic operator is being used.
For example in case of, np.add(cube1, array2)
, the control proceeds as below:
array_ufunc()
[dunder=false] --> __add__()
--> array_ufunc()
[dunder=True] --> now it proceeds similarly as +
operator.
All numpy ufuncs leverage the respective existing dunder methods, which ensures the implementation is consistent.
Hello :wave:, Thanks for your contribution to ndcube! I have marked this pull request as stale because there hasn't had any activity in five months. If you are still working on this, or if it's waiting on a maintainer to look at it then please let us know and we will keep it open. Please add a comment with: @sunpy/ndcube-developers to get someone's attention. If nobody comments on this pull request for another month, it will be closed.