micropython-ulab icon indicating copy to clipboard operation
micropython-ulab copied to clipboard

[FEATURE REQUEST] Implement numpy.ma

Open water5 opened this issue 3 years ago • 3 comments

https://numpy.org/doc/stable/reference/maskedarray.generic.html

numpy.ma have some functions, can we implement several? numpy.ma.array, numpy.ma.ones, numpy.ma.empty, numpy.ma.arange, numpy.ma.masked_where

import numpy.ma as ma
a = ma.arange(25).reshape(5, 5)
a

masked_array( data=[[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]], mask=False, fill_value=999999)

a.mask = a > 7
a

masked_array( data=[[0, 1, 2, 3, 4], [5, 6, 7, --, --], [--, --, --, --, --], [--, --, --, --, --], [--, --, --, --, --]], mask=[[False, False, False, False, False], [False, False, False, True, True], [ True, True, True, True, True], [ True, True, True, True, True], [ True, True, True, True, True]], fill_value=999999)

a *= 10
a

masked_array( data=[[0, 10, 20, 30, 40], [50, 60, 70, --, --], [--, --, --, --, --], [--, --, --, --, --], [--, --, --, --, --]], mask=[[False, False, False, False, False], [False, False, False, True, True], [ True, True, True, True, True], [ True, True, True, True, True], [ True, True, True, True, True]], fill_value=999999)

a.mask = ma.nomask
a

masked_array( data=[[0, 10, 20, 30, 40], [50, 60, 70, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]], mask=[[False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False], [False, False, False, False, False]], fill_value=999999)

ma.masked_where(a > 8, a)

masked_array( data=[[0, --, --, --, --], [--, --, --, 8, --], [--, --, --, --, --], [5, 6, 7, 8, --], [--, --, --, --, --]], mask=[[False, True, True, True, True], [ True, True, True, False, True], [ True, True, True, True, True], [False, False, False, False, True], [ True, True, True, True, True]], fill_value=999999)

water5 avatar Jan 19 '22 14:01 water5

@water5 But can't you achieve the same thing via Boolean indexing? You just want to get rid of missing detector data, so that the missing data don't mess up any subsequent calculations, right?

a = np.array([1, 2, 3, -1, 5])
sum(a[a > 0])

I don't quite see, where the masked arrays would have an advantage (beyond convenience), when compared to Boolean indexing.

In principle, I am not against the idea, but since this is a quite significant undertaking, I cannot assign high priority to it. Also, there are quite a few functions in numpy that we haven't yet implemented, but if we implemented everything, then we would have exactly that, numpy. I would like to re-iterate one of the first sentences of the user manual:

ulab implements a small subset of numpy and scipy. The functions were chosen such that they might be useful in the context of a microcontroller.

We never wanted to produce a one-to-one copy of numpy, and I think, it wouldn't make too much sense. If you want numpy, then use numpy. Given the manpower that we have here, we have to be very selective as to what we want to implement, and how.

v923z avatar Jan 19 '22 15:01 v923z

numpy.isin is a choice for instead numpy.ma, https://numpy.org/doc/stable/reference/generated/numpy.isin.html Which easier to implement between numpy.isin and numpy.ma.*?

a = np.arange(9).reshape((3, 3))
a

array([[0, 1, 2], [3, 4, 5], [6, 7, 8]], dtype=int16)

test_element = [1, 3, 6, 8]
test_element

[1, 3, 6, 8]

mask_ = np.isin(a, test_element)
mask_

array([[False, True, False], [ True, False, False], [ True, False, True]])

a[mask_]

array([1, 3, 6, 8])

a[mask_] *= 10
a

array([[ 0, 10, 2], [30, 4, 5], [60, 7, 80]])


But numpy.isin required likes below operation, ulab.numpy not implement currently:

a = np.arange(5)
a

array([0, 1, 2, 3, 4], dtype=int16)

mask_ = [1, 3]
a[mask_]

Nothing to output.

I see https://github.com/v923z/micropython-ulab/issues/487, https://github.com/v923z/micropython-ulab/pull/488, is it that implement above operation after done?

water5 avatar Jan 25 '22 13:01 water5

@water5 isin is definitely easier to implement. However, when you iterate over the rows of an array, you actually get a view, i.e., if you manipulate the row, you are, in effect, manipulating the original array. Would

from ulab import numpy as np

a = np.array(range(25)).reshape((5, 5))
test_elements = [3, 6, 7, 8]

for row in a:
    for i in range(a.shape[1]):
        value = row[i]
        if value in test_elements:
            row[i] = 10 * value


print(a)

be unacceptably slow?

v923z avatar Jan 25 '22 16:01 v923z